File size: 103,610 Bytes
7baae4b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 |
2024-03-09 17:05:59,774 INFO [train.py:1065] (3/4) Training started
2024-03-09 17:05:59,774 INFO [train.py:1075] (3/4) Device: cuda:3
2024-03-09 17:05:59,855 INFO [lexicon.py:168] (3/4) Loading pre-compiled data/lang_char/Linv.pt
2024-03-09 17:05:59,869 INFO [train.py:1086] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2989b0b1186fa6022932804f5b39fbb2781ebf42', 'k2-git-date': 'Fri Nov 24 11:34:10 2023', 'lhotse-version': '1.22.0.dev+git.d8ed1bbb.dirty', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev/mdcc', 'icefall-git-sha1': '8b7ca604-clean', 'icefall-git-date': 'Sat Mar 9 14:09:58 2024', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.4.dev20231207+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.22.0.dev0+git.d8ed1bbb.dirty-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-1207150844-f49d8c4f4-c49d5', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 31, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp'), 'lang_dir': PosixPath('data/lang_char'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 1, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'blank_id': 0, 'vocab_size': 4852}
2024-03-09 17:05:59,869 INFO [train.py:1088] (3/4) About to create model
2024-03-09 17:06:00,577 INFO [train.py:1092] (3/4) Number of model parameters: 74470867
2024-03-09 17:06:00,578 INFO [checkpoint.py:112] (3/4) Loading checkpoint from zipformer/exp/epoch-30.pt
2024-03-09 17:06:07,828 INFO [train.py:1107] (3/4) Using DDP
2024-03-09 17:06:08,429 INFO [train.py:1119] (3/4) Loading optimizer state dict
2024-03-09 17:06:09,483 INFO [train.py:1127] (3/4) Loading scheduler state dict
2024-03-09 17:06:09,484 INFO [asr_datamodule.py:368] (3/4) About to get train cuts
2024-03-09 17:06:09,530 INFO [asr_datamodule.py:376] (3/4) About to get valid cuts
2024-03-09 17:06:09,532 INFO [asr_datamodule.py:195] (3/4) About to get Musan cuts
2024-03-09 17:06:11,951 INFO [asr_datamodule.py:200] (3/4) Enable MUSAN
2024-03-09 17:06:11,951 INFO [asr_datamodule.py:223] (3/4) Enable SpecAugment
2024-03-09 17:06:11,951 INFO [asr_datamodule.py:224] (3/4) Time warp factor: 80
2024-03-09 17:06:11,952 INFO [asr_datamodule.py:234] (3/4) Num frame mask: 10
2024-03-09 17:06:11,952 INFO [asr_datamodule.py:247] (3/4) About to create train dataset
2024-03-09 17:06:11,952 INFO [asr_datamodule.py:273] (3/4) Using DynamicBucketingSampler.
2024-03-09 17:06:12,773 INFO [asr_datamodule.py:290] (3/4) About to create train dataloader
2024-03-09 17:06:12,773 INFO [asr_datamodule.py:315] (3/4) About to create dev dataset
2024-03-09 17:06:13,100 INFO [asr_datamodule.py:332] (3/4) About to create dev dataloader
2024-03-09 17:06:13,100 INFO [train.py:1205] (3/4) Loading grad scaler state dict
2024-03-09 17:06:53,813 INFO [train.py:997] (3/4) Epoch 31, batch 0, loss[loss=0.1304, simple_loss=0.2259, pruned_loss=0.0175, over 22859.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.2259, pruned_loss=0.0175, over 22859.00 frames. ], batch size: 608, lr: 1.41e-02, grad_scale: 64.0
2024-03-09 17:06:53,813 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:07:03,243 INFO [train.py:1029] (3/4) Epoch 31, validation: loss=0.2089, simple_loss=0.3019, pruned_loss=0.05794, over 452978.00 frames.
2024-03-09 17:07:03,244 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 24707MB
2024-03-09 17:07:04,460 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0
2024-03-09 17:07:07,758 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0
2024-03-09 17:07:17,162 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0
2024-03-09 17:07:52,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=31800.0, ans=0.125
2024-03-09 17:07:54,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=31800.0, ans=0.05
2024-03-09 17:07:58,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=31800.0, ans=0.125
2024-03-09 17:08:06,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=31866.666666666668, ans=0.125
2024-03-09 17:08:12,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=31866.666666666668, ans=0.125
2024-03-09 17:08:17,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=31866.666666666668, ans=0.125
2024-03-09 17:08:21,814 INFO [train.py:997] (3/4) Epoch 31, batch 50, loss[loss=0.1526, simple_loss=0.2376, pruned_loss=0.03384, over 23894.00 frames. ], tot_loss[loss=0.1462, simple_loss=0.2352, pruned_loss=0.02855, over 1067497.66 frames. ], batch size: 153, lr: 1.41e-02, grad_scale: 64.0
2024-03-09 17:08:22,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31933.333333333332, ans=0.1
2024-03-09 17:08:22,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=31933.333333333332, ans=0.125
2024-03-09 17:08:25,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=31933.333333333332, ans=0.035
2024-03-09 17:08:49,153 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0
2024-03-09 17:08:54,736 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.909e+01 7.298e+01 7.941e+01 8.893e+01 1.039e+02, threshold=1.588e+02, percent-clipped=0.0
2024-03-09 17:09:14,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=32066.666666666668, ans=0.125
2024-03-09 17:09:20,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=32133.333333333332, ans=0.125
2024-03-09 17:09:22,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=32133.333333333332, ans=0.2
2024-03-09 17:09:35,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=32200.0, ans=0.0038695652173913048
2024-03-09 17:09:48,298 INFO [train.py:997] (3/4) Epoch 31, batch 100, loss[loss=0.1465, simple_loss=0.2404, pruned_loss=0.02631, over 24124.00 frames. ], tot_loss[loss=0.1466, simple_loss=0.2362, pruned_loss=0.02852, over 1889551.81 frames. ], batch size: 326, lr: 1.40e-02, grad_scale: 64.0
2024-03-09 17:10:07,962 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.69 vs. limit=15.0
2024-03-09 17:10:57,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=32533.333333333332, ans=0.003797101449275363
2024-03-09 17:11:07,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=32600.0, ans=0.95
2024-03-09 17:11:08,727 INFO [train.py:997] (3/4) Epoch 31, batch 150, loss[loss=0.144, simple_loss=0.2385, pruned_loss=0.02476, over 24218.00 frames. ], tot_loss[loss=0.1463, simple_loss=0.236, pruned_loss=0.0283, over 2512128.61 frames. ], batch size: 295, lr: 1.40e-02, grad_scale: 64.0
2024-03-09 17:12:00,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=32653.333333333332, ans=0.125
2024-03-09 17:12:06,856 INFO [train.py:997] (3/4) Epoch 32, batch 0, loss[loss=0.1392, simple_loss=0.2259, pruned_loss=0.02618, over 24189.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.2259, pruned_loss=0.02618, over 24189.00 frames. ], batch size: 188, lr: 1.38e-02, grad_scale: 64.0
2024-03-09 17:12:06,857 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:12:16,508 INFO [train.py:1029] (3/4) Epoch 32, validation: loss=0.2101, simple_loss=0.3027, pruned_loss=0.0588, over 452978.00 frames.
2024-03-09 17:12:16,509 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:12:18,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=32653.333333333332, ans=0.125
2024-03-09 17:12:20,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=32653.333333333332, ans=0.125
2024-03-09 17:12:32,055 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.203e+01 7.071e+01 7.685e+01 8.593e+01 1.169e+02, threshold=1.537e+02, percent-clipped=0.0
2024-03-09 17:12:57,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=32786.666666666664, ans=0.0
2024-03-09 17:13:04,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=32853.333333333336, ans=0.125
2024-03-09 17:13:04,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=32853.333333333336, ans=0.125
2024-03-09 17:13:27,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32920.0, ans=0.1
2024-03-09 17:13:30,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=32920.0, ans=0.2
2024-03-09 17:13:34,611 INFO [train.py:997] (3/4) Epoch 32, batch 50, loss[loss=0.1395, simple_loss=0.2327, pruned_loss=0.02316, over 24223.00 frames. ], tot_loss[loss=0.1444, simple_loss=0.2332, pruned_loss=0.02779, over 1062126.86 frames. ], batch size: 295, lr: 1.38e-02, grad_scale: 64.0
2024-03-09 17:14:03,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=33053.333333333336, ans=0.02
2024-03-09 17:14:19,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=33120.0, ans=0.003669565217391305
2024-03-09 17:14:28,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=33186.666666666664, ans=0.125
2024-03-09 17:14:59,393 INFO [train.py:997] (3/4) Epoch 32, batch 100, loss[loss=0.1445, simple_loss=0.228, pruned_loss=0.03048, over 23594.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.2328, pruned_loss=0.02729, over 1881462.97 frames. ], batch size: 128, lr: 1.37e-02, grad_scale: 64.0
2024-03-09 17:15:15,496 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.885e+01 7.174e+01 7.568e+01 8.159e+01 1.038e+02, threshold=1.514e+02, percent-clipped=0.0
2024-03-09 17:15:16,785 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0
2024-03-09 17:15:24,306 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.99 vs. limit=10.0
2024-03-09 17:15:29,949 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.29 vs. limit=15.0
2024-03-09 17:15:32,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=33453.333333333336, ans=0.125
2024-03-09 17:15:38,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=33453.333333333336, ans=0.125
2024-03-09 17:15:44,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=33453.333333333336, ans=0.125
2024-03-09 17:15:55,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=33520.0, ans=0.125
2024-03-09 17:16:19,711 INFO [train.py:997] (3/4) Epoch 32, batch 150, loss[loss=0.1458, simple_loss=0.236, pruned_loss=0.02785, over 24173.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.2326, pruned_loss=0.02735, over 2518627.96 frames. ], batch size: 217, lr: 1.37e-02, grad_scale: 64.0
2024-03-09 17:17:08,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33706.666666666664, ans=0.1
2024-03-09 17:17:14,939 INFO [train.py:997] (3/4) Epoch 33, batch 0, loss[loss=0.1338, simple_loss=0.2188, pruned_loss=0.02447, over 24247.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.2188, pruned_loss=0.02447, over 24247.00 frames. ], batch size: 217, lr: 1.35e-02, grad_scale: 64.0
2024-03-09 17:17:14,940 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:17:24,826 INFO [train.py:1029] (3/4) Epoch 33, validation: loss=0.2104, simple_loss=0.3043, pruned_loss=0.05821, over 452978.00 frames.
2024-03-09 17:17:24,826 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:17:30,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=33706.666666666664, ans=0.2
2024-03-09 17:17:51,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=33773.333333333336, ans=0.125
2024-03-09 17:17:53,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=33773.333333333336, ans=0.125
2024-03-09 17:18:02,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=33840.0, ans=0.125
2024-03-09 17:18:15,959 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0
2024-03-09 17:18:27,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=33973.333333333336, ans=10.0
2024-03-09 17:18:29,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=33973.333333333336, ans=0.125
2024-03-09 17:18:40,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=33973.333333333336, ans=0.1
2024-03-09 17:18:42,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=34040.0, ans=0.0
2024-03-09 17:18:43,168 INFO [train.py:997] (3/4) Epoch 33, batch 50, loss[loss=0.1443, simple_loss=0.2367, pruned_loss=0.02593, over 24153.00 frames. ], tot_loss[loss=0.1413, simple_loss=0.2307, pruned_loss=0.02598, over 1070736.99 frames. ], batch size: 345, lr: 1.35e-02, grad_scale: 64.0
2024-03-09 17:18:45,940 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0
2024-03-09 17:18:46,184 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.045e+01 7.058e+01 7.697e+01 8.414e+01 1.529e+02, threshold=1.539e+02, percent-clipped=1.0
2024-03-09 17:18:49,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=34040.0, ans=0.125
2024-03-09 17:19:23,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=34173.333333333336, ans=0.125
2024-03-09 17:19:28,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=34173.333333333336, ans=0.125
2024-03-09 17:19:53,913 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0
2024-03-09 17:20:08,479 INFO [train.py:997] (3/4) Epoch 33, batch 100, loss[loss=0.14, simple_loss=0.2332, pruned_loss=0.0234, over 24264.00 frames. ], tot_loss[loss=0.1423, simple_loss=0.2313, pruned_loss=0.02668, over 1886315.86 frames. ], batch size: 267, lr: 1.35e-02, grad_scale: 64.0
2024-03-09 17:20:19,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=34373.333333333336, ans=0.125
2024-03-09 17:20:21,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=34373.333333333336, ans=0.125
2024-03-09 17:20:22,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34440.0, ans=0.1
2024-03-09 17:20:27,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=34440.0, ans=0.0033826086956521735
2024-03-09 17:20:59,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=34573.333333333336, ans=0.003353623188405797
2024-03-09 17:21:28,186 INFO [train.py:997] (3/4) Epoch 33, batch 150, loss[loss=0.1482, simple_loss=0.2403, pruned_loss=0.02803, over 24108.00 frames. ], tot_loss[loss=0.1436, simple_loss=0.2334, pruned_loss=0.02687, over 2524659.63 frames. ], batch size: 366, lr: 1.34e-02, grad_scale: 64.0
2024-03-09 17:21:31,130 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.628e+01 7.574e+01 8.231e+01 9.009e+01 1.365e+02, threshold=1.646e+02, percent-clipped=0.0
2024-03-09 17:21:37,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=34706.666666666664, ans=0.0033246376811594206
2024-03-09 17:22:22,790 INFO [train.py:997] (3/4) Epoch 34, batch 0, loss[loss=0.1447, simple_loss=0.2278, pruned_loss=0.03075, over 24317.00 frames. ], tot_loss[loss=0.1447, simple_loss=0.2278, pruned_loss=0.03075, over 24317.00 frames. ], batch size: 208, lr: 1.32e-02, grad_scale: 64.0
2024-03-09 17:22:22,791 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:22:32,283 INFO [train.py:1029] (3/4) Epoch 34, validation: loss=0.2117, simple_loss=0.3053, pruned_loss=0.0591, over 452978.00 frames.
2024-03-09 17:22:32,284 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:22:37,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=34760.0, ans=0.125
2024-03-09 17:22:37,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=34760.0, ans=0.0
2024-03-09 17:23:03,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=34893.333333333336, ans=0.0
2024-03-09 17:23:23,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=34960.0, ans=0.125
2024-03-09 17:23:31,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=34960.0, ans=0.003269565217391304
2024-03-09 17:23:49,725 INFO [train.py:997] (3/4) Epoch 34, batch 50, loss[loss=0.1437, simple_loss=0.2395, pruned_loss=0.02396, over 23916.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.2305, pruned_loss=0.02589, over 1068620.07 frames. ], batch size: 387, lr: 1.32e-02, grad_scale: 128.0
2024-03-09 17:24:15,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=35160.0, ans=0.0
2024-03-09 17:24:17,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=35160.0, ans=0.003226086956521739
2024-03-09 17:24:21,400 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.82 vs. limit=15.0
2024-03-09 17:24:26,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=35226.666666666664, ans=0.09899494936611666
2024-03-09 17:24:31,974 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=22.5
2024-03-09 17:24:37,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=35226.666666666664, ans=0.125
2024-03-09 17:24:46,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35293.333333333336, ans=0.1
2024-03-09 17:24:54,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=35293.333333333336, ans=0.0
2024-03-09 17:25:04,712 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.790e+01 6.987e+01 7.379e+01 8.041e+01 1.553e+02, threshold=1.476e+02, percent-clipped=0.0
2024-03-09 17:25:13,959 INFO [train.py:997] (3/4) Epoch 34, batch 100, loss[loss=0.1508, simple_loss=0.2387, pruned_loss=0.03142, over 24214.00 frames. ], tot_loss[loss=0.1421, simple_loss=0.2317, pruned_loss=0.02628, over 1886370.96 frames. ], batch size: 198, lr: 1.32e-02, grad_scale: 128.0
2024-03-09 17:25:31,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=35493.333333333336, ans=0.125
2024-03-09 17:25:34,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=35493.333333333336, ans=0.125
2024-03-09 17:25:44,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35560.0, ans=0.1
2024-03-09 17:26:24,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=35693.333333333336, ans=0.125
2024-03-09 17:26:32,970 INFO [train.py:997] (3/4) Epoch 34, batch 150, loss[loss=0.1418, simple_loss=0.2355, pruned_loss=0.024, over 24099.00 frames. ], tot_loss[loss=0.1434, simple_loss=0.2326, pruned_loss=0.02708, over 2530717.88 frames. ], batch size: 345, lr: 1.32e-02, grad_scale: 128.0
2024-03-09 17:27:26,418 INFO [train.py:997] (3/4) Epoch 35, batch 0, loss[loss=0.1572, simple_loss=0.2507, pruned_loss=0.03183, over 23655.00 frames. ], tot_loss[loss=0.1572, simple_loss=0.2507, pruned_loss=0.03183, over 23655.00 frames. ], batch size: 485, lr: 1.30e-02, grad_scale: 128.0
2024-03-09 17:27:26,418 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:27:38,585 INFO [train.py:1029] (3/4) Epoch 35, validation: loss=0.2098, simple_loss=0.3027, pruned_loss=0.05849, over 452978.00 frames.
2024-03-09 17:27:38,586 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:27:56,501 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0
2024-03-09 17:28:08,890 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0
2024-03-09 17:28:15,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=35946.666666666664, ans=0.2
2024-03-09 17:28:16,328 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0
2024-03-09 17:28:25,599 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 17:28:30,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=36013.333333333336, ans=0.5
2024-03-09 17:28:34,546 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.276e+01 7.140e+01 7.953e+01 8.912e+01 1.249e+02, threshold=1.591e+02, percent-clipped=0.0
2024-03-09 17:28:58,519 INFO [train.py:997] (3/4) Epoch 35, batch 50, loss[loss=0.1423, simple_loss=0.2305, pruned_loss=0.02702, over 24238.00 frames. ], tot_loss[loss=0.1403, simple_loss=0.2286, pruned_loss=0.02598, over 1073848.84 frames. ], batch size: 241, lr: 1.30e-02, grad_scale: 128.0
2024-03-09 17:28:58,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=36146.666666666664, ans=0.125
2024-03-09 17:29:24,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=36213.333333333336, ans=0.0
2024-03-09 17:29:26,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=36213.333333333336, ans=0.125
2024-03-09 17:29:48,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=36346.666666666664, ans=0.125
2024-03-09 17:30:05,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=36413.333333333336, ans=0.002953623188405796
2024-03-09 17:30:15,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=36413.333333333336, ans=0.002953623188405796
2024-03-09 17:30:18,437 INFO [train.py:997] (3/4) Epoch 35, batch 100, loss[loss=0.1327, simple_loss=0.219, pruned_loss=0.02318, over 23649.00 frames. ], tot_loss[loss=0.1402, simple_loss=0.2289, pruned_loss=0.02575, over 1901305.08 frames. ], batch size: 128, lr: 1.29e-02, grad_scale: 128.0
2024-03-09 17:30:36,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=36546.666666666664, ans=0.035
2024-03-09 17:30:37,534 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.78 vs. limit=15.0
2024-03-09 17:30:49,824 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0
2024-03-09 17:31:05,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=36680.0, ans=0.125
2024-03-09 17:31:18,165 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.800e+01 7.204e+01 7.789e+01 8.601e+01 1.817e+02, threshold=1.558e+02, percent-clipped=1.0
2024-03-09 17:31:38,613 INFO [train.py:997] (3/4) Epoch 35, batch 150, loss[loss=0.1292, simple_loss=0.2144, pruned_loss=0.02196, over 19900.00 frames. ], tot_loss[loss=0.1413, simple_loss=0.2305, pruned_loss=0.02599, over 2517737.96 frames. ], batch size: 60, lr: 1.29e-02, grad_scale: 64.0
2024-03-09 17:32:32,816 INFO [train.py:997] (3/4) Epoch 36, batch 0, loss[loss=0.1595, simple_loss=0.2567, pruned_loss=0.03116, over 23607.00 frames. ], tot_loss[loss=0.1595, simple_loss=0.2567, pruned_loss=0.03116, over 23607.00 frames. ], batch size: 486, lr: 1.27e-02, grad_scale: 64.0
2024-03-09 17:32:32,817 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:32:40,710 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([0.9559, 2.3706, 2.5634, 2.5911], device='cuda:3')
2024-03-09 17:32:42,864 INFO [train.py:1029] (3/4) Epoch 36, validation: loss=0.212, simple_loss=0.307, pruned_loss=0.05847, over 452978.00 frames.
2024-03-09 17:32:42,864 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:32:52,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=36866.666666666664, ans=0.125
2024-03-09 17:32:54,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=36866.666666666664, ans=0.002855072463768117
2024-03-09 17:33:03,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=36933.333333333336, ans=0.2
2024-03-09 17:33:21,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37000.0, ans=0.1
2024-03-09 17:33:30,096 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.01 vs. limit=10.0
2024-03-09 17:33:35,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=37066.666666666664, ans=0.125
2024-03-09 17:34:10,525 INFO [train.py:997] (3/4) Epoch 36, batch 50, loss[loss=0.1393, simple_loss=0.2257, pruned_loss=0.02643, over 24229.00 frames. ], tot_loss[loss=0.141, simple_loss=0.2306, pruned_loss=0.02566, over 1083429.72 frames. ], batch size: 229, lr: 1.27e-02, grad_scale: 64.0
2024-03-09 17:34:17,641 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=6.0
2024-03-09 17:34:20,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=37200.0, ans=0.0
2024-03-09 17:34:23,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=37200.0, ans=0.0027826086956521745
2024-03-09 17:34:30,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=37266.666666666664, ans=0.125
2024-03-09 17:34:32,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37266.666666666664, ans=0.1
2024-03-09 17:34:37,822 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0
2024-03-09 17:34:42,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=37333.333333333336, ans=0.125
2024-03-09 17:34:55,597 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.060e+01 6.975e+01 7.752e+01 8.346e+01 1.468e+02, threshold=1.550e+02, percent-clipped=0.0
2024-03-09 17:35:13,888 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0
2024-03-09 17:35:16,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=37466.666666666664, ans=0.125
2024-03-09 17:35:16,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=37466.666666666664, ans=0.1
2024-03-09 17:35:19,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=37466.666666666664, ans=0.0
2024-03-09 17:35:28,455 INFO [train.py:997] (3/4) Epoch 36, batch 100, loss[loss=0.1411, simple_loss=0.2352, pruned_loss=0.02345, over 24173.00 frames. ], tot_loss[loss=0.1414, simple_loss=0.2317, pruned_loss=0.02551, over 1901434.56 frames. ], batch size: 327, lr: 1.27e-02, grad_scale: 64.0
2024-03-09 17:35:41,988 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=22.5
2024-03-09 17:35:49,914 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0
2024-03-09 17:35:58,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=37600.0, ans=0.125
2024-03-09 17:36:17,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=37733.333333333336, ans=0.0
2024-03-09 17:36:20,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=37733.333333333336, ans=0.0
2024-03-09 17:36:34,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37800.0, ans=0.1
2024-03-09 17:36:36,452 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.70 vs. limit=22.5
2024-03-09 17:36:40,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=37800.0, ans=0.2
2024-03-09 17:36:46,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=37800.0, ans=0.5
2024-03-09 17:36:50,960 INFO [train.py:997] (3/4) Epoch 36, batch 150, loss[loss=0.154, simple_loss=0.2485, pruned_loss=0.02981, over 23956.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.2314, pruned_loss=0.025, over 2528549.44 frames. ], batch size: 416, lr: 1.27e-02, grad_scale: 64.0
2024-03-09 17:37:46,094 INFO [train.py:997] (3/4) Epoch 37, batch 0, loss[loss=0.1261, simple_loss=0.2107, pruned_loss=0.02078, over 23768.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.2107, pruned_loss=0.02078, over 23768.00 frames. ], batch size: 117, lr: 1.25e-02, grad_scale: 64.0
2024-03-09 17:37:46,095 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:37:55,593 INFO [train.py:1029] (3/4) Epoch 37, validation: loss=0.2112, simple_loss=0.3044, pruned_loss=0.05893, over 452978.00 frames.
2024-03-09 17:37:55,594 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:37:58,206 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5
2024-03-09 17:38:02,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=37920.0, ans=0.125
2024-03-09 17:38:21,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=37986.666666666664, ans=0.0026115942028985505
2024-03-09 17:38:25,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=37986.666666666664, ans=0.0
2024-03-09 17:38:30,964 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.112e+01 7.137e+01 7.682e+01 8.524e+01 1.300e+02, threshold=1.536e+02, percent-clipped=0.0
2024-03-09 17:38:35,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=38053.333333333336, ans=0.0
2024-03-09 17:39:13,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=38186.666666666664, ans=0.125
2024-03-09 17:39:18,716 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 17:39:20,073 INFO [train.py:997] (3/4) Epoch 37, batch 50, loss[loss=0.1372, simple_loss=0.2284, pruned_loss=0.02299, over 24264.00 frames. ], tot_loss[loss=0.1389, simple_loss=0.2299, pruned_loss=0.02396, over 1063994.42 frames. ], batch size: 254, lr: 1.25e-02, grad_scale: 64.0
2024-03-09 17:39:25,531 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0
2024-03-09 17:39:27,193 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.11 vs. limit=12.0
2024-03-09 17:39:40,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=38320.0, ans=0.125
2024-03-09 17:39:42,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=38320.0, ans=0.125
2024-03-09 17:39:54,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=38386.666666666664, ans=0.95
2024-03-09 17:40:03,083 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.73 vs. limit=15.0
2024-03-09 17:40:16,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=38453.333333333336, ans=0.2
2024-03-09 17:40:34,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=38520.0, ans=0.125
2024-03-09 17:40:40,615 INFO [train.py:997] (3/4) Epoch 37, batch 100, loss[loss=0.1576, simple_loss=0.2546, pruned_loss=0.03027, over 23809.00 frames. ], tot_loss[loss=0.1395, simple_loss=0.2304, pruned_loss=0.02426, over 1881976.31 frames. ], batch size: 447, lr: 1.25e-02, grad_scale: 64.0
2024-03-09 17:41:13,701 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0
2024-03-09 17:41:15,923 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.868e+01 6.991e+01 7.571e+01 8.226e+01 1.121e+02, threshold=1.514e+02, percent-clipped=0.0
2024-03-09 17:42:00,691 INFO [train.py:997] (3/4) Epoch 37, batch 150, loss[loss=0.1369, simple_loss=0.231, pruned_loss=0.02144, over 24177.00 frames. ], tot_loss[loss=0.1398, simple_loss=0.2302, pruned_loss=0.02471, over 2517978.69 frames. ], batch size: 345, lr: 1.24e-02, grad_scale: 64.0
2024-03-09 17:42:52,945 INFO [train.py:997] (3/4) Epoch 38, batch 0, loss[loss=0.14, simple_loss=0.2301, pruned_loss=0.02497, over 24294.00 frames. ], tot_loss[loss=0.14, simple_loss=0.2301, pruned_loss=0.02497, over 24294.00 frames. ], batch size: 281, lr: 1.23e-02, grad_scale: 64.0
2024-03-09 17:42:52,946 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:43:02,281 INFO [train.py:1029] (3/4) Epoch 38, validation: loss=0.2136, simple_loss=0.3079, pruned_loss=0.05959, over 452978.00 frames.
2024-03-09 17:43:02,281 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:43:13,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=38973.333333333336, ans=0.125
2024-03-09 17:43:17,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=38973.333333333336, ans=0.0
2024-03-09 17:43:21,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=39040.0, ans=0.125
2024-03-09 17:43:21,986 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.71 vs. limit=15.0
2024-03-09 17:43:39,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=39106.666666666664, ans=0.125
2024-03-09 17:43:43,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=39106.666666666664, ans=0.2
2024-03-09 17:44:27,811 INFO [train.py:997] (3/4) Epoch 38, batch 50, loss[loss=0.1516, simple_loss=0.2365, pruned_loss=0.03341, over 24172.00 frames. ], tot_loss[loss=0.1376, simple_loss=0.2267, pruned_loss=0.02422, over 1065371.03 frames. ], batch size: 217, lr: 1.22e-02, grad_scale: 64.0
2024-03-09 17:44:31,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=39306.666666666664, ans=0.0023246376811594206
2024-03-09 17:44:46,022 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0
2024-03-09 17:44:48,016 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.028e+01 7.170e+01 7.896e+01 8.779e+01 1.113e+02, threshold=1.579e+02, percent-clipped=0.0
2024-03-09 17:44:48,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=39373.333333333336, ans=0.125
2024-03-09 17:44:49,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39373.333333333336, ans=0.1
2024-03-09 17:45:10,429 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.88 vs. limit=15.0
2024-03-09 17:45:18,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=39506.666666666664, ans=0.04949747468305833
2024-03-09 17:45:34,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=39573.333333333336, ans=0.125
2024-03-09 17:45:46,175 INFO [train.py:997] (3/4) Epoch 38, batch 100, loss[loss=0.1162, simple_loss=0.2123, pruned_loss=0.01007, over 21384.00 frames. ], tot_loss[loss=0.1405, simple_loss=0.2295, pruned_loss=0.02576, over 1882146.18 frames. ], batch size: 718, lr: 1.22e-02, grad_scale: 64.0
2024-03-09 17:46:30,015 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5
2024-03-09 17:46:43,686 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.66 vs. limit=15.0
2024-03-09 17:46:47,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=39840.0, ans=0.0
2024-03-09 17:47:07,594 INFO [train.py:997] (3/4) Epoch 38, batch 150, loss[loss=0.1462, simple_loss=0.2428, pruned_loss=0.0248, over 23946.00 frames. ], tot_loss[loss=0.1399, simple_loss=0.2299, pruned_loss=0.02494, over 2501198.54 frames. ], batch size: 387, lr: 1.22e-02, grad_scale: 64.0
2024-03-09 17:48:03,475 INFO [train.py:997] (3/4) Epoch 39, batch 0, loss[loss=0.1398, simple_loss=0.2286, pruned_loss=0.02548, over 24199.00 frames. ], tot_loss[loss=0.1398, simple_loss=0.2286, pruned_loss=0.02548, over 24199.00 frames. ], batch size: 217, lr: 1.20e-02, grad_scale: 64.0
2024-03-09 17:48:03,476 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:48:11,687 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7097, 5.2407, 5.6102, 5.2988], device='cuda:3')
2024-03-09 17:48:12,746 INFO [train.py:1029] (3/4) Epoch 39, validation: loss=0.2141, simple_loss=0.3082, pruned_loss=0.06004, over 452978.00 frames.
2024-03-09 17:48:12,746 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:48:26,644 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.993e+01 6.884e+01 7.356e+01 8.157e+01 1.068e+02, threshold=1.471e+02, percent-clipped=0.0
2024-03-09 17:48:42,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=40093.333333333336, ans=0.125
2024-03-09 17:49:03,034 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.78 vs. limit=6.0
2024-03-09 17:49:08,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40226.666666666664, ans=0.1
2024-03-09 17:49:09,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=40226.666666666664, ans=0.0
2024-03-09 17:49:29,147 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=15.0
2024-03-09 17:49:41,667 INFO [train.py:997] (3/4) Epoch 39, batch 50, loss[loss=0.1371, simple_loss=0.2304, pruned_loss=0.02189, over 24247.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.2274, pruned_loss=0.02289, over 1077777.57 frames. ], batch size: 281, lr: 1.20e-02, grad_scale: 64.0
2024-03-09 17:49:54,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=40360.0, ans=0.125
2024-03-09 17:50:14,995 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0
2024-03-09 17:50:20,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=40493.333333333336, ans=0.035
2024-03-09 17:50:22,754 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0
2024-03-09 17:50:26,893 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5
2024-03-09 17:50:35,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=40560.0, ans=0.125
2024-03-09 17:50:59,997 INFO [train.py:997] (3/4) Epoch 39, batch 100, loss[loss=0.1445, simple_loss=0.2396, pruned_loss=0.02469, over 24066.00 frames. ], tot_loss[loss=0.1401, simple_loss=0.2304, pruned_loss=0.02493, over 1883949.65 frames. ], batch size: 365, lr: 1.20e-02, grad_scale: 64.0
2024-03-09 17:51:00,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=40693.333333333336, ans=0.125
2024-03-09 17:51:08,194 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 17:51:09,406 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.940e+01 6.841e+01 7.461e+01 8.103e+01 1.250e+02, threshold=1.492e+02, percent-clipped=0.0
2024-03-09 17:51:20,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=40760.0, ans=0.0
2024-03-09 17:51:27,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=40760.0, ans=0.0020086956521739134
2024-03-09 17:52:18,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=40960.0, ans=0.2
2024-03-09 17:52:21,037 INFO [train.py:997] (3/4) Epoch 39, batch 150, loss[loss=0.1416, simple_loss=0.2332, pruned_loss=0.02503, over 24144.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.2294, pruned_loss=0.02411, over 2518309.01 frames. ], batch size: 345, lr: 1.20e-02, grad_scale: 64.0
2024-03-09 17:52:27,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=41026.666666666664, ans=0.0
2024-03-09 17:53:16,194 INFO [train.py:997] (3/4) Epoch 40, batch 0, loss[loss=0.1332, simple_loss=0.2279, pruned_loss=0.01927, over 24126.00 frames. ], tot_loss[loss=0.1332, simple_loss=0.2279, pruned_loss=0.01927, over 24126.00 frames. ], batch size: 366, lr: 1.18e-02, grad_scale: 64.0
2024-03-09 17:53:16,194 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:53:25,708 INFO [train.py:1029] (3/4) Epoch 40, validation: loss=0.2148, simple_loss=0.3085, pruned_loss=0.06058, over 452978.00 frames.
2024-03-09 17:53:25,709 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:53:59,429 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0
2024-03-09 17:54:05,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=41213.333333333336, ans=0.125
2024-03-09 17:54:07,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41213.333333333336, ans=0.1
2024-03-09 17:54:12,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=41213.333333333336, ans=0.125
2024-03-09 17:54:20,635 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0
2024-03-09 17:54:47,012 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.979e+01 7.013e+01 7.603e+01 8.055e+01 1.247e+02, threshold=1.521e+02, percent-clipped=0.0
2024-03-09 17:54:51,547 INFO [train.py:997] (3/4) Epoch 40, batch 50, loss[loss=0.1389, simple_loss=0.2284, pruned_loss=0.02473, over 24190.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.2284, pruned_loss=0.02328, over 1068897.50 frames. ], batch size: 280, lr: 1.18e-02, grad_scale: 64.0
2024-03-09 17:54:58,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=41413.333333333336, ans=0.04949747468305833
2024-03-09 17:55:10,278 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 17:55:31,194 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0
2024-03-09 17:55:34,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=41546.666666666664, ans=0.025
2024-03-09 17:55:58,723 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0
2024-03-09 17:56:11,517 INFO [train.py:997] (3/4) Epoch 40, batch 100, loss[loss=0.1397, simple_loss=0.2378, pruned_loss=0.02077, over 24135.00 frames. ], tot_loss[loss=0.137, simple_loss=0.2274, pruned_loss=0.02327, over 1889769.59 frames. ], batch size: 366, lr: 1.18e-02, grad_scale: 64.0
2024-03-09 17:56:17,034 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0
2024-03-09 17:56:23,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=41746.666666666664, ans=0.2
2024-03-09 17:56:37,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=41813.333333333336, ans=0.0017797101449275356
2024-03-09 17:56:47,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=41880.0, ans=0.0017652173913043478
2024-03-09 17:57:15,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=42013.333333333336, ans=0.2
2024-03-09 17:57:18,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=42013.333333333336, ans=0.2
2024-03-09 17:57:25,919 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.807e+01 6.999e+01 7.479e+01 8.341e+01 1.133e+02, threshold=1.496e+02, percent-clipped=0.0
2024-03-09 17:57:29,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=42080.0, ans=0.025
2024-03-09 17:57:30,894 INFO [train.py:997] (3/4) Epoch 40, batch 150, loss[loss=0.128, simple_loss=0.2208, pruned_loss=0.01763, over 22982.00 frames. ], tot_loss[loss=0.137, simple_loss=0.2273, pruned_loss=0.02336, over 2516987.52 frames. ], batch size: 609, lr: 1.18e-02, grad_scale: 64.0
2024-03-09 17:57:32,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=42080.0, ans=0.0
2024-03-09 17:57:34,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42080.0, ans=0.1
2024-03-09 17:58:21,372 INFO [train.py:997] (3/4) Epoch 41, batch 0, loss[loss=0.131, simple_loss=0.2185, pruned_loss=0.02179, over 24135.00 frames. ], tot_loss[loss=0.131, simple_loss=0.2185, pruned_loss=0.02179, over 24135.00 frames. ], batch size: 240, lr: 1.16e-02, grad_scale: 64.0
2024-03-09 17:58:21,372 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 17:58:30,940 INFO [train.py:1029] (3/4) Epoch 41, validation: loss=0.2136, simple_loss=0.3076, pruned_loss=0.05982, over 452978.00 frames.
2024-03-09 17:58:30,941 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 17:58:47,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=42200.0, ans=0.125
2024-03-09 17:58:52,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=42200.0, ans=0.0016956521739130443
2024-03-09 17:59:24,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=42333.333333333336, ans=0.125
2024-03-09 17:59:41,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=42400.0, ans=0.2
2024-03-09 17:59:53,550 INFO [train.py:997] (3/4) Epoch 41, batch 50, loss[loss=0.1266, simple_loss=0.2204, pruned_loss=0.01633, over 24085.00 frames. ], tot_loss[loss=0.1351, simple_loss=0.2258, pruned_loss=0.02225, over 1067454.82 frames. ], batch size: 344, lr: 1.16e-02, grad_scale: 64.0
2024-03-09 18:00:19,292 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0
2024-03-09 18:00:37,651 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.37 vs. limit=15.0
2024-03-09 18:00:38,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=42600.0, ans=0.125
2024-03-09 18:00:55,611 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.788e+01 7.025e+01 7.943e+01 8.921e+01 1.202e+02, threshold=1.589e+02, percent-clipped=0.0
2024-03-09 18:01:00,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=42733.333333333336, ans=0.001579710144927535
2024-03-09 18:01:05,928 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0
2024-03-09 18:01:11,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=42733.333333333336, ans=0.0
2024-03-09 18:01:14,024 INFO [train.py:997] (3/4) Epoch 41, batch 100, loss[loss=0.1442, simple_loss=0.2327, pruned_loss=0.02788, over 23047.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.2276, pruned_loss=0.02308, over 1884862.24 frames. ], batch size: 102, lr: 1.16e-02, grad_scale: 64.0
2024-03-09 18:01:33,418 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.97 vs. limit=22.5
2024-03-09 18:01:38,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=42866.666666666664, ans=0.125
2024-03-09 18:01:39,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=42866.666666666664, ans=22.5
2024-03-09 18:01:55,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=42933.333333333336, ans=0.125
2024-03-09 18:02:03,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=43000.0, ans=0.125
2024-03-09 18:02:05,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=43000.0, ans=0.125
2024-03-09 18:02:11,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=43000.0, ans=0.125
2024-03-09 18:02:34,745 INFO [train.py:997] (3/4) Epoch 41, batch 150, loss[loss=0.1369, simple_loss=0.2339, pruned_loss=0.01992, over 23932.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.228, pruned_loss=0.02321, over 2527203.76 frames. ], batch size: 387, lr: 1.16e-02, grad_scale: 64.0
2024-03-09 18:02:35,558 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0
2024-03-09 18:02:37,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=43133.333333333336, ans=0.2
2024-03-09 18:02:39,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=43133.333333333336, ans=0.0014927536231884048
2024-03-09 18:03:28,793 INFO [train.py:997] (3/4) Epoch 42, batch 0, loss[loss=0.1431, simple_loss=0.2415, pruned_loss=0.02238, over 23966.00 frames. ], tot_loss[loss=0.1431, simple_loss=0.2415, pruned_loss=0.02238, over 23966.00 frames. ], batch size: 416, lr: 1.14e-02, grad_scale: 64.0
2024-03-09 18:03:28,793 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:03:38,340 INFO [train.py:1029] (3/4) Epoch 42, validation: loss=0.2135, simple_loss=0.3075, pruned_loss=0.05972, over 452978.00 frames.
2024-03-09 18:03:38,341 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:04:03,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=43253.333333333336, ans=0.0
2024-03-09 18:04:04,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=43253.333333333336, ans=0.0014666666666666665
2024-03-09 18:04:07,114 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=12.0
2024-03-09 18:04:09,198 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00
2024-03-09 18:04:12,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=43320.0, ans=0.0
2024-03-09 18:04:12,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=43320.0, ans=0.0
2024-03-09 18:04:29,006 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.865e+01 6.812e+01 7.244e+01 8.018e+01 1.063e+02, threshold=1.449e+02, percent-clipped=0.0
2024-03-09 18:04:50,271 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0
2024-03-09 18:04:58,767 INFO [train.py:997] (3/4) Epoch 42, batch 50, loss[loss=0.1467, simple_loss=0.2284, pruned_loss=0.03251, over 23906.00 frames. ], tot_loss[loss=0.1341, simple_loss=0.2251, pruned_loss=0.02157, over 1069473.35 frames. ], batch size: 153, lr: 1.14e-02, grad_scale: 64.0
2024-03-09 18:05:06,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=43520.0, ans=0.125
2024-03-09 18:05:19,865 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0
2024-03-09 18:05:49,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=43720.0, ans=0.125
2024-03-09 18:05:57,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=43720.0, ans=0.035
2024-03-09 18:06:20,948 INFO [train.py:997] (3/4) Epoch 42, batch 100, loss[loss=0.1437, simple_loss=0.2436, pruned_loss=0.02187, over 23829.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.225, pruned_loss=0.02169, over 1881314.16 frames. ], batch size: 447, lr: 1.14e-02, grad_scale: 64.0
2024-03-09 18:06:33,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=43853.333333333336, ans=0.1
2024-03-09 18:07:07,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=44053.333333333336, ans=0.001292753623188406
2024-03-09 18:07:09,737 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.750e+01 6.712e+01 7.266e+01 7.977e+01 1.080e+02, threshold=1.453e+02, percent-clipped=0.0
2024-03-09 18:07:10,771 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0
2024-03-09 18:07:24,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=44120.0, ans=0.2
2024-03-09 18:07:39,991 INFO [train.py:997] (3/4) Epoch 42, batch 150, loss[loss=0.1351, simple_loss=0.2215, pruned_loss=0.02432, over 20004.00 frames. ], tot_loss[loss=0.1355, simple_loss=0.2269, pruned_loss=0.0221, over 2516694.39 frames. ], batch size: 60, lr: 1.14e-02, grad_scale: 64.0
2024-03-09 18:07:40,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=44186.666666666664, ans=0.2
2024-03-09 18:08:31,601 INFO [train.py:997] (3/4) Epoch 43, batch 0, loss[loss=0.1485, simple_loss=0.2454, pruned_loss=0.02575, over 23705.00 frames. ], tot_loss[loss=0.1485, simple_loss=0.2454, pruned_loss=0.02575, over 23705.00 frames. ], batch size: 485, lr: 1.12e-02, grad_scale: 64.0
2024-03-09 18:08:31,602 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:08:41,004 INFO [train.py:1029] (3/4) Epoch 43, validation: loss=0.2134, simple_loss=0.3077, pruned_loss=0.05952, over 452978.00 frames.
2024-03-09 18:08:41,005 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:08:53,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=44240.0, ans=0.125
2024-03-09 18:09:01,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=44306.666666666664, ans=0.0
2024-03-09 18:09:50,359 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.16 vs. limit=15.0
2024-03-09 18:09:50,990 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00
2024-03-09 18:09:58,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=44506.666666666664, ans=0.2
2024-03-09 18:10:01,088 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0
2024-03-09 18:10:01,381 INFO [train.py:997] (3/4) Epoch 43, batch 50, loss[loss=0.1285, simple_loss=0.2119, pruned_loss=0.02257, over 20430.00 frames. ], tot_loss[loss=0.1379, simple_loss=0.2283, pruned_loss=0.02375, over 1072487.93 frames. ], batch size: 62, lr: 1.12e-02, grad_scale: 64.0
2024-03-09 18:10:36,516 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.916e+01 6.864e+01 7.263e+01 8.155e+01 1.054e+02, threshold=1.453e+02, percent-clipped=0.0
2024-03-09 18:10:40,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=44706.666666666664, ans=0.125
2024-03-09 18:10:46,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=44773.333333333336, ans=0.125
2024-03-09 18:11:19,221 INFO [train.py:997] (3/4) Epoch 43, batch 100, loss[loss=0.1368, simple_loss=0.2324, pruned_loss=0.02057, over 24162.00 frames. ], tot_loss[loss=0.1356, simple_loss=0.2261, pruned_loss=0.02253, over 1889264.58 frames. ], batch size: 345, lr: 1.12e-02, grad_scale: 64.0
2024-03-09 18:12:10,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=45106.666666666664, ans=0.125
2024-03-09 18:12:33,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=45173.333333333336, ans=0.0
2024-03-09 18:12:34,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=45173.333333333336, ans=0.0
2024-03-09 18:12:40,863 INFO [train.py:997] (3/4) Epoch 43, batch 150, loss[loss=0.113, simple_loss=0.2075, pruned_loss=0.009225, over 21451.00 frames. ], tot_loss[loss=0.1355, simple_loss=0.2267, pruned_loss=0.02216, over 2516550.55 frames. ], batch size: 718, lr: 1.12e-02, grad_scale: 32.0
2024-03-09 18:12:49,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=45240.0, ans=0.125
2024-03-09 18:13:36,399 INFO [train.py:997] (3/4) Epoch 44, batch 0, loss[loss=0.1242, simple_loss=0.2121, pruned_loss=0.0181, over 23603.00 frames. ], tot_loss[loss=0.1242, simple_loss=0.2121, pruned_loss=0.0181, over 23603.00 frames. ], batch size: 128, lr: 1.10e-02, grad_scale: 32.0
2024-03-09 18:13:36,399 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:13:45,433 INFO [train.py:1029] (3/4) Epoch 44, validation: loss=0.2121, simple_loss=0.3064, pruned_loss=0.05891, over 452978.00 frames.
2024-03-09 18:13:45,434 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:14:02,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=45293.333333333336, ans=0.125
2024-03-09 18:14:06,614 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0
2024-03-09 18:14:07,943 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=12.0
2024-03-09 18:14:19,825 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.880e+01 6.918e+01 7.525e+01 8.097e+01 1.200e+02, threshold=1.505e+02, percent-clipped=0.0
2024-03-09 18:15:05,780 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0
2024-03-09 18:15:09,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=45560.0, ans=0.2
2024-03-09 18:15:12,599 INFO [train.py:997] (3/4) Epoch 44, batch 50, loss[loss=0.1368, simple_loss=0.2234, pruned_loss=0.02505, over 24053.00 frames. ], tot_loss[loss=0.1364, simple_loss=0.2266, pruned_loss=0.02309, over 1070767.10 frames. ], batch size: 165, lr: 1.10e-02, grad_scale: 32.0
2024-03-09 18:15:15,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=45626.666666666664, ans=0.125
2024-03-09 18:15:22,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=45626.666666666664, ans=0.125
2024-03-09 18:15:48,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=45760.0, ans=0.125
2024-03-09 18:15:52,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=45760.0, ans=0.125
2024-03-09 18:15:58,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=45826.666666666664, ans=0.2
2024-03-09 18:16:20,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=45893.333333333336, ans=0.125
2024-03-09 18:16:30,683 INFO [train.py:997] (3/4) Epoch 44, batch 100, loss[loss=0.1365, simple_loss=0.2287, pruned_loss=0.02215, over 24256.00 frames. ], tot_loss[loss=0.1377, simple_loss=0.2287, pruned_loss=0.0234, over 1887612.04 frames. ], batch size: 281, lr: 1.10e-02, grad_scale: 16.0
2024-03-09 18:16:34,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=45960.0, ans=0.95
2024-03-09 18:16:45,383 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=6.0
2024-03-09 18:16:49,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=46026.666666666664, ans=0.125
2024-03-09 18:16:56,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=46026.666666666664, ans=0.0008637681159420294
2024-03-09 18:17:01,035 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.671e+01 6.824e+01 7.356e+01 8.103e+01 1.148e+02, threshold=1.471e+02, percent-clipped=0.0
2024-03-09 18:17:09,518 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=22.5
2024-03-09 18:17:22,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=46160.0, ans=0.04949747468305833
2024-03-09 18:17:36,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=46226.666666666664, ans=0.0008202898550724643
2024-03-09 18:17:46,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=46226.666666666664, ans=0.0
2024-03-09 18:17:51,976 INFO [train.py:997] (3/4) Epoch 44, batch 150, loss[loss=0.1265, simple_loss=0.2256, pruned_loss=0.01369, over 24226.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.2276, pruned_loss=0.02283, over 2517182.93 frames. ], batch size: 327, lr: 1.10e-02, grad_scale: 16.0
2024-03-09 18:18:43,508 INFO [train.py:997] (3/4) Epoch 45, batch 0, loss[loss=0.1409, simple_loss=0.2269, pruned_loss=0.02746, over 24064.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.2269, pruned_loss=0.02746, over 24064.00 frames. ], batch size: 165, lr: 1.09e-02, grad_scale: 32.0
2024-03-09 18:18:43,509 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:18:53,093 INFO [train.py:1029] (3/4) Epoch 45, validation: loss=0.2137, simple_loss=0.3089, pruned_loss=0.05927, over 452978.00 frames.
2024-03-09 18:18:53,094 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:19:05,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46346.666666666664, ans=0.1
2024-03-09 18:19:38,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=46480.0, ans=0.125
2024-03-09 18:19:39,048 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0
2024-03-09 18:19:45,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46546.666666666664, ans=0.1
2024-03-09 18:19:51,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=46546.666666666664, ans=0.125
2024-03-09 18:19:55,489 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0
2024-03-09 18:20:07,000 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 18:20:11,480 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2024-03-09 18:20:16,257 INFO [train.py:997] (3/4) Epoch 45, batch 50, loss[loss=0.1298, simple_loss=0.2171, pruned_loss=0.02127, over 24318.00 frames. ], tot_loss[loss=0.135, simple_loss=0.2262, pruned_loss=0.02193, over 1073449.42 frames. ], batch size: 208, lr: 1.08e-02, grad_scale: 32.0
2024-03-09 18:20:22,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46680.0, ans=0.1
2024-03-09 18:20:29,932 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.843e+01 6.817e+01 7.386e+01 8.152e+01 1.203e+02, threshold=1.477e+02, percent-clipped=0.0
2024-03-09 18:20:39,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=46746.666666666664, ans=0.5
2024-03-09 18:20:43,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=46746.666666666664, ans=0.05
2024-03-09 18:20:47,466 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.88 vs. limit=10.0
2024-03-09 18:20:51,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=46813.333333333336, ans=0.125
2024-03-09 18:20:51,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46813.333333333336, ans=0.1
2024-03-09 18:21:25,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=46946.666666666664, ans=0.0006637681159420306
2024-03-09 18:21:35,473 INFO [train.py:997] (3/4) Epoch 45, batch 100, loss[loss=0.1357, simple_loss=0.2272, pruned_loss=0.0221, over 24266.00 frames. ], tot_loss[loss=0.135, simple_loss=0.226, pruned_loss=0.02198, over 1890623.48 frames. ], batch size: 311, lr: 1.08e-02, grad_scale: 32.0
2024-03-09 18:22:22,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=47213.333333333336, ans=0.2
2024-03-09 18:22:22,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=47213.333333333336, ans=0.0
2024-03-09 18:22:41,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47280.0, ans=0.1
2024-03-09 18:22:45,466 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0
2024-03-09 18:22:55,744 INFO [train.py:997] (3/4) Epoch 45, batch 150, loss[loss=0.139, simple_loss=0.2273, pruned_loss=0.02528, over 24223.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.2259, pruned_loss=0.02176, over 2515683.97 frames. ], batch size: 229, lr: 1.08e-02, grad_scale: 16.0
2024-03-09 18:22:59,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=47346.666666666664, ans=0.2
2024-03-09 18:23:50,625 INFO [train.py:997] (3/4) Epoch 46, batch 0, loss[loss=0.1476, simple_loss=0.2425, pruned_loss=0.02637, over 23723.00 frames. ], tot_loss[loss=0.1476, simple_loss=0.2425, pruned_loss=0.02637, over 23723.00 frames. ], batch size: 486, lr: 1.07e-02, grad_scale: 16.0
2024-03-09 18:23:50,626 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:24:00,487 INFO [train.py:1029] (3/4) Epoch 46, validation: loss=0.2142, simple_loss=0.3085, pruned_loss=0.05997, over 452978.00 frames.
2024-03-09 18:24:00,488 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:24:05,180 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.866e+01 6.849e+01 7.495e+01 7.996e+01 1.078e+02, threshold=1.499e+02, percent-clipped=0.0
2024-03-09 18:24:06,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=47400.0, ans=0.125
2024-03-09 18:24:14,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=47400.0, ans=0.2
2024-03-09 18:24:16,593 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0
2024-03-09 18:24:20,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=47466.666666666664, ans=0.2
2024-03-09 18:24:26,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=47466.666666666664, ans=0.0005507246376811603
2024-03-09 18:24:27,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47466.666666666664, ans=0.1
2024-03-09 18:24:28,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=47466.666666666664, ans=15.0
2024-03-09 18:24:37,977 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0
2024-03-09 18:24:41,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=47533.333333333336, ans=0.0005362318840579708
2024-03-09 18:24:58,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=47600.0, ans=0.125
2024-03-09 18:25:25,827 INFO [train.py:997] (3/4) Epoch 46, batch 50, loss[loss=0.1312, simple_loss=0.2258, pruned_loss=0.01824, over 24210.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.2231, pruned_loss=0.02053, over 1071835.54 frames. ], batch size: 295, lr: 1.07e-02, grad_scale: 16.0
2024-03-09 18:26:45,329 INFO [train.py:997] (3/4) Epoch 46, batch 100, loss[loss=0.1194, simple_loss=0.2152, pruned_loss=0.01181, over 22860.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.2239, pruned_loss=0.02092, over 1888353.71 frames. ], batch size: 608, lr: 1.06e-02, grad_scale: 16.0
2024-03-09 18:26:49,977 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.653e+01 6.627e+01 7.164e+01 7.678e+01 1.012e+02, threshold=1.433e+02, percent-clipped=0.0
2024-03-09 18:27:05,755 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0
2024-03-09 18:27:55,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=48333.333333333336, ans=0.1
2024-03-09 18:28:06,136 INFO [train.py:997] (3/4) Epoch 46, batch 150, loss[loss=0.1635, simple_loss=0.2528, pruned_loss=0.0371, over 23209.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.2259, pruned_loss=0.02125, over 2526246.99 frames. ], batch size: 534, lr: 1.06e-02, grad_scale: 16.0
2024-03-09 18:28:08,672 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=15.0
2024-03-09 18:29:00,564 INFO [train.py:997] (3/4) Epoch 47, batch 0, loss[loss=0.134, simple_loss=0.2278, pruned_loss=0.02012, over 24208.00 frames. ], tot_loss[loss=0.134, simple_loss=0.2278, pruned_loss=0.02012, over 24208.00 frames. ], batch size: 295, lr: 1.05e-02, grad_scale: 32.0
2024-03-09 18:29:00,565 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:29:10,389 INFO [train.py:1029] (3/4) Epoch 47, validation: loss=0.2152, simple_loss=0.3095, pruned_loss=0.06041, over 452978.00 frames.
2024-03-09 18:29:10,390 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:29:11,567 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=22.5
2024-03-09 18:29:15,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=48453.333333333336, ans=0.125
2024-03-09 18:29:19,097 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.59 vs. limit=10.0
2024-03-09 18:29:42,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=48586.666666666664, ans=0.1
2024-03-09 18:30:28,053 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.832e+01 6.822e+01 7.253e+01 7.989e+01 1.051e+02, threshold=1.451e+02, percent-clipped=0.0
2024-03-09 18:30:34,258 INFO [train.py:997] (3/4) Epoch 47, batch 50, loss[loss=0.1498, simple_loss=0.2312, pruned_loss=0.0342, over 23977.00 frames. ], tot_loss[loss=0.1308, simple_loss=0.2213, pruned_loss=0.02016, over 1069116.76 frames. ], batch size: 153, lr: 1.05e-02, grad_scale: 16.0
2024-03-09 18:30:41,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=48786.666666666664, ans=0.125
2024-03-09 18:31:08,154 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0
2024-03-09 18:31:09,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=48920.0, ans=0.00023478260869565226
2024-03-09 18:31:12,984 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0
2024-03-09 18:31:25,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=48986.666666666664, ans=0.125
2024-03-09 18:31:28,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=48986.666666666664, ans=0.2
2024-03-09 18:31:28,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=48986.666666666664, ans=0.0
2024-03-09 18:31:44,546 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00
2024-03-09 18:31:53,479 INFO [train.py:997] (3/4) Epoch 47, batch 100, loss[loss=0.1403, simple_loss=0.2377, pruned_loss=0.02142, over 23945.00 frames. ], tot_loss[loss=0.134, simple_loss=0.225, pruned_loss=0.02154, over 1881478.66 frames. ], batch size: 387, lr: 1.05e-02, grad_scale: 8.0
2024-03-09 18:32:02,423 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0
2024-03-09 18:32:05,335 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.83 vs. limit=10.0
2024-03-09 18:32:43,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=49320.0, ans=0.125
2024-03-09 18:32:48,835 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0
2024-03-09 18:32:54,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=49320.0, ans=0.0
2024-03-09 18:32:58,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=49386.666666666664, ans=0.0
2024-03-09 18:33:07,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=49386.666666666664, ans=0.0
2024-03-09 18:33:10,428 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.939e+01 7.123e+01 7.707e+01 8.583e+01 1.160e+02, threshold=1.541e+02, percent-clipped=0.0
2024-03-09 18:33:15,538 INFO [train.py:997] (3/4) Epoch 47, batch 150, loss[loss=0.1354, simple_loss=0.2301, pruned_loss=0.02033, over 24039.00 frames. ], tot_loss[loss=0.1345, simple_loss=0.2259, pruned_loss=0.02161, over 2511654.63 frames. ], batch size: 344, lr: 1.05e-02, grad_scale: 8.0
2024-03-09 18:33:15,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=49453.333333333336, ans=0.1
2024-03-09 18:34:03,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=49506.666666666664, ans=0.125
2024-03-09 18:34:05,686 INFO [train.py:997] (3/4) Epoch 48, batch 0, loss[loss=0.118, simple_loss=0.2089, pruned_loss=0.01354, over 23939.00 frames. ], tot_loss[loss=0.118, simple_loss=0.2089, pruned_loss=0.01354, over 23939.00 frames. ], batch size: 142, lr: 1.03e-02, grad_scale: 16.0
2024-03-09 18:34:05,687 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:34:15,169 INFO [train.py:1029] (3/4) Epoch 48, validation: loss=0.2149, simple_loss=0.3083, pruned_loss=0.06081, over 452978.00 frames.
2024-03-09 18:34:15,170 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:34:33,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=49573.333333333336, ans=0.0
2024-03-09 18:34:44,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=49573.333333333336, ans=0.1
2024-03-09 18:34:45,546 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.30 vs. limit=10.0
2024-03-09 18:35:04,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=49640.0, ans=0.125
2024-03-09 18:35:24,460 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=12.0
2024-03-09 18:35:39,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=49840.0, ans=0.125
2024-03-09 18:35:40,447 INFO [train.py:997] (3/4) Epoch 48, batch 50, loss[loss=0.1303, simple_loss=0.2189, pruned_loss=0.02089, over 24226.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.2236, pruned_loss=0.02108, over 1074716.06 frames. ], batch size: 217, lr: 1.03e-02, grad_scale: 16.0
2024-03-09 18:35:41,581 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0
2024-03-09 18:35:53,152 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00
2024-03-09 18:36:03,370 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=15.0
2024-03-09 18:36:11,064 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0
2024-03-09 18:36:16,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=49973.333333333336, ans=0.125
2024-03-09 18:36:22,286 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=15.0
2024-03-09 18:36:32,171 INFO [scaling.py:1119] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2024-03-09 18:36:39,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50040.0, ans=0.1
2024-03-09 18:36:39,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=50040.0, ans=0.125
2024-03-09 18:36:42,443 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.770e+01 6.729e+01 7.301e+01 8.005e+01 9.735e+01, threshold=1.460e+02, percent-clipped=0.0
2024-03-09 18:36:47,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=50106.666666666664, ans=0.2
2024-03-09 18:36:59,143 INFO [train.py:997] (3/4) Epoch 48, batch 100, loss[loss=0.1205, simple_loss=0.2024, pruned_loss=0.01929, over 23722.00 frames. ], tot_loss[loss=0.1341, simple_loss=0.2248, pruned_loss=0.02172, over 1889845.79 frames. ], batch size: 117, lr: 1.03e-02, grad_scale: 16.0
2024-03-09 18:37:25,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=50240.0, ans=0.07
2024-03-09 18:37:34,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=50306.666666666664, ans=0.2
2024-03-09 18:37:50,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=50373.333333333336, ans=0.0
2024-03-09 18:38:02,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=50373.333333333336, ans=0.0
2024-03-09 18:38:02,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=50373.333333333336, ans=0.125
2024-03-09 18:38:20,105 INFO [train.py:997] (3/4) Epoch 48, batch 150, loss[loss=0.1282, simple_loss=0.2205, pruned_loss=0.0179, over 24263.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.2247, pruned_loss=0.02112, over 2507628.60 frames. ], batch size: 241, lr: 1.03e-02, grad_scale: 8.0
2024-03-09 18:38:29,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=50506.666666666664, ans=0.09899494936611666
2024-03-09 18:38:29,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=50506.666666666664, ans=0.0
2024-03-09 18:39:15,061 INFO [train.py:997] (3/4) Epoch 49, batch 0, loss[loss=0.1372, simple_loss=0.2326, pruned_loss=0.02094, over 23746.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.2326, pruned_loss=0.02094, over 23746.00 frames. ], batch size: 486, lr: 1.02e-02, grad_scale: 16.0
2024-03-09 18:39:15,062 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:39:24,778 INFO [train.py:1029] (3/4) Epoch 49, validation: loss=0.2171, simple_loss=0.31, pruned_loss=0.06203, over 452978.00 frames.
2024-03-09 18:39:24,779 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:39:45,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=50626.666666666664, ans=0.125
2024-03-09 18:39:51,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=50626.666666666664, ans=0.0
2024-03-09 18:40:11,811 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0
2024-03-09 18:40:12,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=50693.333333333336, ans=0.2
2024-03-09 18:40:23,493 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.861e+01 6.914e+01 7.599e+01 8.430e+01 1.205e+02, threshold=1.520e+02, percent-clipped=0.0
2024-03-09 18:40:37,111 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0
2024-03-09 18:40:47,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=50826.666666666664, ans=0.125
2024-03-09 18:40:51,381 INFO [train.py:997] (3/4) Epoch 49, batch 50, loss[loss=0.1153, simple_loss=0.2104, pruned_loss=0.01008, over 21415.00 frames. ], tot_loss[loss=0.1328, simple_loss=0.2238, pruned_loss=0.02091, over 1064043.14 frames. ], batch size: 718, lr: 1.02e-02, grad_scale: 16.0
2024-03-09 18:41:22,009 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=15.0
2024-03-09 18:41:40,434 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=12.0
2024-03-09 18:41:51,339 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0
2024-03-09 18:42:10,662 INFO [train.py:997] (3/4) Epoch 49, batch 100, loss[loss=0.1366, simple_loss=0.2256, pruned_loss=0.02384, over 24195.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.2229, pruned_loss=0.02069, over 1879066.52 frames. ], batch size: 188, lr: 1.01e-02, grad_scale: 8.0
2024-03-09 18:42:12,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=51226.666666666664, ans=0.125
2024-03-09 18:42:18,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=51226.666666666664, ans=0.05
2024-03-09 18:42:18,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=51226.666666666664, ans=0.125
2024-03-09 18:42:25,546 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5
2024-03-09 18:42:27,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=51293.333333333336, ans=0.1
2024-03-09 18:42:41,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=51293.333333333336, ans=0.0
2024-03-09 18:43:04,162 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.035e+01 6.804e+01 7.380e+01 7.884e+01 1.078e+02, threshold=1.476e+02, percent-clipped=0.0
2024-03-09 18:43:17,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=51493.333333333336, ans=0.0
2024-03-09 18:43:30,600 INFO [train.py:997] (3/4) Epoch 49, batch 150, loss[loss=0.1279, simple_loss=0.2197, pruned_loss=0.0181, over 24245.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.2243, pruned_loss=0.02132, over 2507386.41 frames. ], batch size: 198, lr: 1.01e-02, grad_scale: 8.0
2024-03-09 18:43:39,347 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.74 vs. limit=6.0
2024-03-09 18:44:22,357 INFO [train.py:997] (3/4) Epoch 50, batch 0, loss[loss=0.134, simple_loss=0.2338, pruned_loss=0.0171, over 23883.00 frames. ], tot_loss[loss=0.134, simple_loss=0.2338, pruned_loss=0.0171, over 23883.00 frames. ], batch size: 447, lr: 1.00e-02, grad_scale: 16.0
2024-03-09 18:44:22,357 INFO [train.py:1020] (3/4) Computing validation loss
2024-03-09 18:44:30,928 INFO [zipformer.py:1858] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.2012, 3.7216, 3.9667, 2.6843], device='cuda:3')
2024-03-09 18:44:31,920 INFO [train.py:1029] (3/4) Epoch 50, validation: loss=0.2164, simple_loss=0.3113, pruned_loss=0.06071, over 452978.00 frames.
2024-03-09 18:44:31,920 INFO [train.py:1030] (3/4) Maximum memory allocated so far is 27673MB
2024-03-09 18:44:32,930 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=22.5
2024-03-09 18:44:42,405 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0
2024-03-09 18:45:29,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=51813.333333333336, ans=0.09899494936611666
2024-03-09 18:45:40,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=51880.0, ans=0.0
2024-03-09 18:45:40,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=51880.0, ans=0.0
2024-03-09 18:45:42,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=51880.0, ans=0.125
2024-03-09 18:45:57,123 INFO [train.py:997] (3/4) Epoch 50, batch 50, loss[loss=0.1173, simple_loss=0.2122, pruned_loss=0.01116, over 22836.00 frames. ], tot_loss[loss=0.131, simple_loss=0.2224, pruned_loss=0.01983, over 1065670.29 frames. ], batch size: 609, lr: 1.00e-02, grad_scale: 8.0
2024-03-09 18:46:16,626 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.96 vs. limit=10.0
2024-03-09 18:46:37,167 WARNING [optim.py:487] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.908e+01 6.888e+01 7.222e+01 7.907e+01 1.090e+02, threshold=1.444e+02, percent-clipped=0.0
2024-03-09 18:47:13,805 INFO [train.py:997] (3/4) Epoch 50, batch 100, loss[loss=0.1329, simple_loss=0.2276, pruned_loss=0.01912, over 24203.00 frames. ], tot_loss[loss=0.132, simple_loss=0.2232, pruned_loss=0.02042, over 1867371.10 frames. ], batch size: 295, lr: 9.99e-03, grad_scale: 8.0
2024-03-09 18:47:19,505 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0
2024-03-09 18:47:22,821 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0
2024-03-09 18:47:56,121 INFO [scaling.py:1023] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.89 vs. limit=10.0
2024-03-09 18:48:27,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=52546.666666666664, ans=0.0
2024-03-09 18:48:27,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=52546.666666666664, ans=0.0
2024-03-09 18:48:28,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=52546.666666666664, ans=0.0
2024-03-09 18:48:36,459 INFO [train.py:997] (3/4) Epoch 50, batch 150, loss[loss=0.1306, simple_loss=0.2205, pruned_loss=0.02031, over 24169.00 frames. ], tot_loss[loss=0.1331, simple_loss=0.2249, pruned_loss=0.02064, over 2500487.36 frames. ], batch size: 217, lr: 9.97e-03, grad_scale: 8.0
2024-03-09 18:48:48,483 INFO [train.py:1248] (3/4) Done!
|