File size: 168,873 Bytes
7baae4b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 |
2024-03-09 12:56:09,176 INFO [train.py:1065] (0/4) Training started
2024-03-09 12:56:09,193 INFO [train.py:1075] (0/4) Device: cuda:0
2024-03-09 12:56:09,282 INFO [lexicon.py:168] (0/4) Loading pre-compiled data/lang_char/Linv.pt
2024-03-09 12:56:09,334 INFO [train.py:1086] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2989b0b1186fa6022932804f5b39fbb2781ebf42', 'k2-git-date': 'Fri Nov 24 11:34:10 2023', 'lhotse-version': '1.22.0.dev+git.d8ed1bbb.dirty', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev/mdcc', 'icefall-git-sha1': 'f62fc7f0-clean', 'icefall-git-date': 'Sat Mar 9 12:55:42 2024', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.4.dev20231207+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.22.0.dev0+git.d8ed1bbb.dirty-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-1207150844-f49d8c4f4-c49d5', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp'), 'lang_dir': PosixPath('data/lang_char'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 1, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'blank_id': 0, 'vocab_size': 4852}
2024-03-09 12:56:09,334 INFO [train.py:1088] (0/4) About to create model
2024-03-09 12:56:09,995 INFO [train.py:1092] (0/4) Number of model parameters: 74470867
2024-03-09 12:56:14,924 INFO [train.py:1107] (0/4) Using DDP
2024-03-09 12:56:15,509 INFO [asr_datamodule.py:368] (0/4) About to get train cuts
2024-03-09 12:56:15,622 INFO [asr_datamodule.py:376] (0/4) About to get valid cuts
2024-03-09 12:56:15,640 INFO [asr_datamodule.py:195] (0/4) About to get Musan cuts
2024-03-09 12:56:18,183 INFO [asr_datamodule.py:200] (0/4) Enable MUSAN
2024-03-09 12:56:18,183 INFO [asr_datamodule.py:223] (0/4) Enable SpecAugment
2024-03-09 12:56:18,183 INFO [asr_datamodule.py:224] (0/4) Time warp factor: 80
2024-03-09 12:56:18,184 INFO [asr_datamodule.py:234] (0/4) Num frame mask: 10
2024-03-09 12:56:18,184 INFO [asr_datamodule.py:247] (0/4) About to create train dataset
2024-03-09 12:56:18,184 INFO [asr_datamodule.py:273] (0/4) Using DynamicBucketingSampler.
2024-03-09 12:56:19,023 INFO [asr_datamodule.py:290] (0/4) About to create train dataloader
2024-03-09 12:56:19,023 INFO [asr_datamodule.py:315] (0/4) About to create dev dataset
2024-03-09 12:56:19,346 INFO [asr_datamodule.py:332] (0/4) About to create dev dataloader
2024-03-09 12:57:18,484 INFO [train.py:997] (0/4) Epoch 1, batch 0, loss[loss=10.43, simple_loss=9.503, pruned_loss=9.26, over 23353.00 frames. ], tot_loss[loss=10.43, simple_loss=9.503, pruned_loss=9.26, over 23353.00 frames. ], batch size: 102, lr: 2.25e-02, grad_scale: 1.0
2024-03-09 12:57:18,486 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 12:57:28,778 INFO [train.py:1029] (0/4) Epoch 1, validation: loss=10.41, simple_loss=9.49, pruned_loss=9.134, over 452978.00 frames.
2024-03-09 12:57:28,779 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 25901MB
2024-03-09 12:57:35,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=0.0, ans=5.0
2024-03-09 12:57:38,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=0.0, ans=0.3
2024-03-09 12:57:42,630 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.90 vs. limit=5.0
2024-03-09 12:57:42,680 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=29.82 vs. limit=7.5
2024-03-09 12:57:45,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=66.66666666666667, ans=0.1975
2024-03-09 12:57:49,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=66.66666666666667, ans=0.0985
2024-03-09 12:57:52,244 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.247e+03 5.651e+03 5.908e+03 6.903e+03 6.981e+03, threshold=2.363e+04, percent-clipped=0.0
2024-03-09 12:57:58,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=66.66666666666667, ans=0.496875
2024-03-09 12:57:58,667 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=107.87 vs. limit=7.525
2024-03-09 12:58:10,355 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+03 3.453e+03 5.651e+03 6.615e+03 7.215e+03, threshold=2.260e+04, percent-clipped=0.0
2024-03-09 12:58:13,213 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=511.69 vs. limit=7.6
2024-03-09 12:58:15,224 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=232.33 vs. limit=7.6
2024-03-09 12:58:18,454 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=87.97 vs. limit=4.053333333333334
2024-03-09 12:58:24,071 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=223.96 vs. limit=7.575
2024-03-09 12:58:31,644 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=149.27 vs. limit=7.575
2024-03-09 12:58:34,666 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=254.66 vs. limit=7.65
2024-03-09 12:58:35,005 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=277.45 vs. limit=7.575
2024-03-09 12:58:45,198 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=72.53 vs. limit=4.1066666666666665
2024-03-09 12:58:46,102 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.817e+02 1.921e+03 2.306e+03 5.651e+03 7.215e+03, threshold=9.223e+03, percent-clipped=0.0
2024-03-09 12:58:52,634 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=168.22 vs. limit=5.133333333333334
2024-03-09 12:58:59,108 INFO [train.py:997] (0/4) Epoch 1, batch 50, loss[loss=1.111, simple_loss=0.9911, pruned_loss=1.077, over 20140.00 frames. ], tot_loss[loss=3.869, simple_loss=3.562, pruned_loss=3.019, over 1065856.81 frames. ], batch size: 61, lr: 2.48e-02, grad_scale: 0.25
2024-03-09 12:59:00,044 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=367.12 vs. limit=7.625
2024-03-09 12:59:01,816 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=342.93 vs. limit=7.75
2024-03-09 12:59:05,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=333.3333333333333, ans=0.8883333333333333
2024-03-09 12:59:11,230 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=468.64 vs. limit=7.625
2024-03-09 12:59:18,026 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=231.76 vs. limit=7.8
2024-03-09 12:59:18,181 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=3.06
2024-03-09 12:59:20,233 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=3.06
2024-03-09 12:59:20,261 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=338.60 vs. limit=7.65
2024-03-09 12:59:26,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=400.0, ans=0.20600000000000002
2024-03-09 12:59:26,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=400.0, ans=0.296
2024-03-09 12:59:32,398 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=315.58 vs. limit=5.2
2024-03-09 12:59:48,062 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=112.75 vs. limit=7.675
2024-03-09 13:00:01,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533.3333333333334, ans=0.29466666666666663
2024-03-09 13:00:02,712 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=229.75 vs. limit=7.7
2024-03-09 13:00:04,724 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=4.213333333333333
2024-03-09 13:00:08,200 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=87.24 vs. limit=7.9
2024-03-09 13:00:13,532 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.42 vs. limit=4.24
2024-03-09 13:00:22,131 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=239.35 vs. limit=7.725
2024-03-09 13:00:23,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=600.0, ans=0.294
2024-03-09 13:00:24,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=600.0, ans=5.3
2024-03-09 13:00:31,788 INFO [train.py:997] (0/4) Epoch 1, batch 100, loss[loss=1.057, simple_loss=0.9247, pruned_loss=1.073, over 24275.00 frames. ], tot_loss[loss=2.348, simple_loss=2.139, pruned_loss=1.952, over 1881556.47 frames. ], batch size: 267, lr: 2.70e-02, grad_scale: 0.5
2024-03-09 13:00:32,876 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=4.266666666666667
2024-03-09 13:00:37,048 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.046e+01 9.193e+01 2.011e+02 2.156e+03 7.215e+03, threshold=4.023e+02, percent-clipped=0.0
2024-03-09 13:00:41,780 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=64.02 vs. limit=7.75
2024-03-09 13:00:43,219 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=56.94 vs. limit=5.333333333333333
2024-03-09 13:00:51,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=733.3333333333334, ans=0.8743333333333334
2024-03-09 13:00:56,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=733.3333333333334, ans=0.04770833333333334
2024-03-09 13:01:02,529 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=13.14 vs. limit=5.183333333333334
2024-03-09 13:01:06,340 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.22 vs. limit=8.1
2024-03-09 13:01:10,128 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=49.03 vs. limit=7.8
2024-03-09 13:01:14,956 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=50.68 vs. limit=8.1
2024-03-09 13:01:29,550 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=160.24 vs. limit=7.825
2024-03-09 13:01:53,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=933.3333333333334, ans=0.04708333333333334
2024-03-09 13:01:53,657 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=16.97 vs. limit=5.233333333333333
2024-03-09 13:02:03,201 INFO [train.py:997] (0/4) Epoch 1, batch 150, loss[loss=0.9259, simple_loss=0.7907, pruned_loss=0.9827, over 24134.00 frames. ], tot_loss[loss=1.782, simple_loss=1.604, pruned_loss=1.567, over 2516736.17 frames. ], batch size: 176, lr: 2.93e-02, grad_scale: 0.5
2024-03-09 13:02:11,048 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=218.28 vs. limit=7.875
2024-03-09 13:02:12,725 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=4.4
2024-03-09 13:02:14,113 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=70.25 vs. limit=8.25
2024-03-09 13:02:14,486 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=56.43 vs. limit=7.875
2024-03-09 13:02:16,403 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-1.pt
2024-03-09 13:03:01,373 INFO [train.py:997] (0/4) Epoch 2, batch 0, loss[loss=1.018, simple_loss=0.8779, pruned_loss=1.021, over 23797.00 frames. ], tot_loss[loss=1.018, simple_loss=0.8779, pruned_loss=1.021, over 23797.00 frames. ], batch size: 447, lr: 2.91e-02, grad_scale: 1.0
2024-03-09 13:03:01,374 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:03:11,801 INFO [train.py:1029] (0/4) Epoch 2, validation: loss=0.9516, simple_loss=0.8161, pruned_loss=0.9787, over 452978.00 frames.
2024-03-09 13:03:11,802 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:03:14,878 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=363.92 vs. limit=7.895
2024-03-09 13:03:30,549 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=63.67 vs. limit=7.92
2024-03-09 13:03:35,802 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=44.64 vs. limit=8.34
2024-03-09 13:03:35,858 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=193.72 vs. limit=7.92
2024-03-09 13:03:39,280 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=114.96 vs. limit=7.92
2024-03-09 13:03:45,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1186.6666666666667, ans=0.444375
2024-03-09 13:03:45,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1186.6666666666667, ans=0.8584666666666667
2024-03-09 13:03:46,654 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=362.60 vs. limit=7.945
2024-03-09 13:03:47,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1186.6666666666667, ans=0.04629166666666667
2024-03-09 13:03:52,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1186.6666666666667, ans=0.1555
2024-03-09 13:04:06,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1253.3333333333333, ans=0.09216666666666667
2024-03-09 13:04:07,408 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=28.63 vs. limit=7.97
2024-03-09 13:04:07,600 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=80.10 vs. limit=7.97
2024-03-09 13:04:07,894 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=24.08 vs. limit=7.97
2024-03-09 13:04:16,360 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=7.97
2024-03-09 13:04:22,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1320.0, ans=0.8538
2024-03-09 13:04:25,867 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=230.87 vs. limit=7.995
2024-03-09 13:04:29,204 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=64.66 vs. limit=5.66
2024-03-09 13:04:31,668 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.992e+01 8.885e+01 1.035e+02 1.288e+02 2.193e+02, threshold=2.069e+02, percent-clipped=0.0
2024-03-09 13:04:36,676 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=97.09 vs. limit=7.995
2024-03-09 13:04:39,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1320.0, ans=0.8538
2024-03-09 13:04:42,154 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=38.92 vs. limit=8.02
2024-03-09 13:04:42,729 INFO [train.py:997] (0/4) Epoch 2, batch 50, loss[loss=0.9398, simple_loss=0.8078, pruned_loss=0.898, over 23692.00 frames. ], tot_loss[loss=0.9102, simple_loss=0.778, pruned_loss=0.9183, over 1074146.93 frames. ], batch size: 486, lr: 3.13e-02, grad_scale: 1.0
2024-03-09 13:04:43,624 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=71.38 vs. limit=8.02
2024-03-09 13:04:56,020 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.02 vs. limit=5.693333333333333
2024-03-09 13:04:57,963 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.01 vs. limit=5.346666666666667
2024-03-09 13:04:59,627 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=47.54 vs. limit=8.59
2024-03-09 13:05:04,752 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=4.581333333333333
2024-03-09 13:05:10,192 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=5.363333333333333
2024-03-09 13:05:17,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1453.3333333333333, ans=0.28546666666666665
2024-03-09 13:05:22,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1520.0, ans=0.14300000000000002
2024-03-09 13:05:31,745 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=8.07
2024-03-09 13:05:47,625 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=81.22 vs. limit=8.69
2024-03-09 13:05:49,321 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=53.78 vs. limit=8.69
2024-03-09 13:05:59,965 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=44.57 vs. limit=8.74
2024-03-09 13:06:02,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1653.3333333333333, ans=0.138
2024-03-09 13:06:07,183 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=16.18 vs. limit=8.12
2024-03-09 13:06:09,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1653.3333333333333, ans=0.4225
2024-03-09 13:06:11,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1653.3333333333333, ans=0.4225
2024-03-09 13:06:16,168 INFO [train.py:997] (0/4) Epoch 2, batch 100, loss[loss=0.9075, simple_loss=0.7764, pruned_loss=0.8376, over 23795.00 frames. ], tot_loss[loss=0.8809, simple_loss=0.7521, pruned_loss=0.8645, over 1877834.46 frames. ], batch size: 447, lr: 3.35e-02, grad_scale: 2.0
2024-03-09 13:06:21,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1720.0, ans=0.419375
2024-03-09 13:06:34,717 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.52 vs. limit=5.446666666666666
2024-03-09 13:06:36,721 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=8.84
2024-03-09 13:06:41,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1786.6666666666667, ans=0.41625
2024-03-09 13:06:52,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1853.3333333333333, ans=0.2683333333333333
2024-03-09 13:07:02,209 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=68.35 vs. limit=8.89
2024-03-09 13:07:22,188 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.25 vs. limit=8.22
2024-03-09 13:07:25,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1920.0, ans=0.41000000000000003
2024-03-09 13:07:26,233 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=21.12 vs. limit=5.96
2024-03-09 13:07:26,415 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=50.27 vs. limit=8.22
2024-03-09 13:07:35,929 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=195.77 vs. limit=8.245
2024-03-09 13:07:37,966 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.386e+01 8.999e+01 1.029e+02 1.187e+02 2.200e+02, threshold=2.058e+02, percent-clipped=1.0
2024-03-09 13:07:46,563 INFO [train.py:997] (0/4) Epoch 2, batch 150, loss[loss=0.8338, simple_loss=0.7059, pruned_loss=0.763, over 23219.00 frames. ], tot_loss[loss=0.8662, simple_loss=0.7386, pruned_loss=0.8275, over 2515639.78 frames. ], batch size: 102, lr: 3.57e-02, grad_scale: 2.0
2024-03-09 13:07:47,938 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=8.27
2024-03-09 13:07:59,675 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-2.pt
2024-03-09 13:08:43,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2106.6666666666665, ans=0.8262666666666667
2024-03-09 13:08:44,927 INFO [train.py:997] (0/4) Epoch 3, batch 0, loss[loss=0.7839, simple_loss=0.6614, pruned_loss=0.7208, over 23163.00 frames. ], tot_loss[loss=0.7839, simple_loss=0.6614, pruned_loss=0.7208, over 23163.00 frames. ], batch size: 102, lr: 3.42e-02, grad_scale: 4.0
2024-03-09 13:08:44,928 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:08:54,190 INFO [train.py:1029] (0/4) Epoch 3, validation: loss=0.8556, simple_loss=0.7313, pruned_loss=0.7513, over 452978.00 frames.
2024-03-09 13:08:54,190 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:08:55,429 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=35.47 vs. limit=8.29
2024-03-09 13:09:00,497 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=9.08
2024-03-09 13:09:01,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2106.6666666666665, ans=0.2366666666666667
2024-03-09 13:09:09,363 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=4.842666666666666
2024-03-09 13:09:16,520 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=39.43 vs. limit=8.315
2024-03-09 13:09:19,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2173.3333333333335, ans=0.0511
2024-03-09 13:09:22,046 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=53.72 vs. limit=6.086666666666667
2024-03-09 13:09:25,400 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=27.77 vs. limit=9.13
2024-03-09 13:09:35,460 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.22 vs. limit=6.12
2024-03-09 13:09:53,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2306.6666666666665, ans=0.04279166666666667
2024-03-09 13:09:54,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2306.6666666666665, ans=0.391875
2024-03-09 13:10:26,707 INFO [train.py:997] (0/4) Epoch 3, batch 50, loss[loss=0.7973, simple_loss=0.6776, pruned_loss=0.6859, over 19767.00 frames. ], tot_loss[loss=0.8008, simple_loss=0.6802, pruned_loss=0.7039, over 1068611.49 frames. ], batch size: 59, lr: 3.63e-02, grad_scale: 4.0
2024-03-09 13:10:38,491 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=30.10 vs. limit=8.415
2024-03-09 13:10:39,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2440.0, ans=0.042375
2024-03-09 13:10:42,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2506.6666666666665, ans=0.3825
2024-03-09 13:10:42,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2506.6666666666665, ans=0.2376
2024-03-09 13:10:48,921 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=5.626666666666667
2024-03-09 13:10:52,645 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=46.87 vs. limit=8.44
2024-03-09 13:11:04,426 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=9.43
2024-03-09 13:11:04,586 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=9.43
2024-03-09 13:11:04,784 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=8.465
2024-03-09 13:11:19,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2640.0, ans=0.10099999999999999
2024-03-09 13:11:28,814 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.96 vs. limit=6.32
2024-03-09 13:11:34,500 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.376e+01 1.355e+02 1.829e+02 2.456e+02 5.542e+02, threshold=3.657e+02, percent-clipped=39.0
2024-03-09 13:11:39,146 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=30.81 vs. limit=8.515
2024-03-09 13:11:47,777 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=5.676666666666667
2024-03-09 13:11:55,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2773.3333333333335, ans=0.037599999999999995
2024-03-09 13:11:56,951 INFO [train.py:997] (0/4) Epoch 3, batch 100, loss[loss=0.6924, simple_loss=0.5943, pruned_loss=0.5563, over 24243.00 frames. ], tot_loss[loss=0.7716, simple_loss=0.6582, pruned_loss=0.6552, over 1880343.40 frames. ], batch size: 188, lr: 3.84e-02, grad_scale: 8.0
2024-03-09 13:12:03,224 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=9.58
2024-03-09 13:12:12,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2840.0, ans=0.366875
2024-03-09 13:12:18,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2840.0, ans=0.2216
2024-03-09 13:12:19,376 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=8.565
2024-03-09 13:12:21,272 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=9.629999999999999
2024-03-09 13:12:24,381 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=9.629999999999999
2024-03-09 13:12:26,266 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=8.565
2024-03-09 13:12:46,236 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=6.453333333333333
2024-03-09 13:12:53,814 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.42 vs. limit=6.486666666666666
2024-03-09 13:13:19,575 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=8.64
2024-03-09 13:13:21,284 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=9.78
2024-03-09 13:13:25,257 INFO [train.py:997] (0/4) Epoch 3, batch 150, loss[loss=0.6125, simple_loss=0.5407, pruned_loss=0.4349, over 24159.00 frames. ], tot_loss[loss=0.7192, simple_loss=0.6192, pruned_loss=0.5825, over 2517629.28 frames. ], batch size: 295, lr: 4.05e-02, grad_scale: 8.0
2024-03-09 13:13:30,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3106.6666666666665, ans=6.941666666666666
2024-03-09 13:13:33,671 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.991e+01
2024-03-09 13:13:35,995 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=8.665
2024-03-09 13:13:38,207 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-3.pt
2024-03-09 13:14:26,850 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.42 vs. limit=6.58
2024-03-09 13:14:27,472 INFO [train.py:997] (0/4) Epoch 4, batch 0, loss[loss=0.5953, simple_loss=0.5279, pruned_loss=0.4147, over 24073.00 frames. ], tot_loss[loss=0.5953, simple_loss=0.5279, pruned_loss=0.4147, over 24073.00 frames. ], batch size: 365, lr: 3.82e-02, grad_scale: 16.0
2024-03-09 13:14:27,473 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:14:37,768 INFO [train.py:1029] (0/4) Epoch 4, validation: loss=0.515, simple_loss=0.4763, pruned_loss=0.3039, over 452978.00 frames.
2024-03-09 13:14:37,769 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:14:44,077 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=9.870000000000001
2024-03-09 13:15:00,759 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=8.71
2024-03-09 13:15:10,612 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=9.92
2024-03-09 13:15:18,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3293.3333333333335, ans=0.34562499999999996
2024-03-09 13:15:20,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3293.3333333333335, ans=0.07649999999999998
2024-03-09 13:15:23,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3293.3333333333335, ans=0.34562499999999996
2024-03-09 13:15:26,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3293.3333333333335, ans=0.34562499999999996
2024-03-09 13:15:31,400 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.600e+02 2.775e+02 3.449e+02 4.262e+02 1.233e+03, threshold=6.899e+02, percent-clipped=36.0
2024-03-09 13:15:36,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3360.0, ans=0.3425
2024-03-09 13:15:38,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3360.0, ans=0.3425
2024-03-09 13:15:42,536 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=10.02
2024-03-09 13:15:47,758 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=8.785
2024-03-09 13:15:55,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3426.6666666666665, ans=0.339375
2024-03-09 13:15:57,310 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=10.07
2024-03-09 13:16:02,389 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=10.07
2024-03-09 13:16:07,037 INFO [train.py:997] (0/4) Epoch 4, batch 50, loss[loss=0.4472, simple_loss=0.4138, pruned_loss=0.2608, over 20172.00 frames. ], tot_loss[loss=0.5215, simple_loss=0.4711, pruned_loss=0.3366, over 1061168.05 frames. ], batch size: 60, lr: 3.92e-02, grad_scale: 8.0
2024-03-09 13:16:15,208 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=10.120000000000001
2024-03-09 13:16:25,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3560.0, ans=0.035
2024-03-09 13:16:48,326 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=8.86
2024-03-09 13:16:56,498 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=8.86
2024-03-09 13:17:03,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3693.3333333333335, ans=0.21306666666666665
2024-03-09 13:17:04,647 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=10.27
2024-03-09 13:17:24,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760.0, ans=0.26239999999999997
2024-03-09 13:17:26,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3760.0, ans=0.07
2024-03-09 13:17:26,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3760.0, ans=0.05899999999999997
2024-03-09 13:17:29,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3760.0, ans=0.32375
2024-03-09 13:17:32,094 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=8.935
2024-03-09 13:17:32,113 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=8.935
2024-03-09 13:17:32,782 INFO [train.py:997] (0/4) Epoch 4, batch 100, loss[loss=0.4661, simple_loss=0.434, pruned_loss=0.2632, over 24016.00 frames. ], tot_loss[loss=0.4865, simple_loss=0.4458, pruned_loss=0.2959, over 1885565.16 frames. ], batch size: 388, lr: 3.92e-02, grad_scale: 8.0
2024-03-09 13:17:56,788 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=8.96
2024-03-09 13:18:06,591 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=5.99
2024-03-09 13:18:14,833 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=8.985
2024-03-09 13:18:26,601 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.478e+02 2.209e+02 2.728e+02 3.814e+02 7.926e+02, threshold=5.455e+02, percent-clipped=1.0
2024-03-09 13:18:30,780 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=5.610666666666667
2024-03-09 13:18:57,703 INFO [train.py:997] (0/4) Epoch 4, batch 150, loss[loss=0.4913, simple_loss=0.4534, pruned_loss=0.2852, over 23722.00 frames. ], tot_loss[loss=0.4589, simple_loss=0.4257, pruned_loss=0.2654, over 2519530.07 frames. ], batch size: 486, lr: 3.91e-02, grad_scale: 8.0
2024-03-09 13:19:01,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4160.0, ans=0.305
2024-03-09 13:19:02,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4160.0, ans=0.2584
2024-03-09 13:19:10,215 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-4.pt
2024-03-09 13:19:56,274 INFO [train.py:997] (0/4) Epoch 5, batch 0, loss[loss=0.3976, simple_loss=0.3804, pruned_loss=0.2011, over 24158.00 frames. ], tot_loss[loss=0.3976, simple_loss=0.3804, pruned_loss=0.2011, over 24158.00 frames. ], batch size: 366, lr: 3.65e-02, grad_scale: 16.0
2024-03-09 13:19:56,275 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:20:05,955 INFO [train.py:1029] (0/4) Epoch 5, validation: loss=0.3626, simple_loss=0.3682, pruned_loss=0.1368, over 452978.00 frames.
2024-03-09 13:20:05,956 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:20:37,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4346.666666666667, ans=0.07283333333333333
2024-03-09 13:20:54,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4413.333333333333, ans=0.29312499999999997
2024-03-09 13:21:12,948 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=5.792
2024-03-09 13:21:23,619 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=6.12
2024-03-09 13:21:30,491 INFO [train.py:997] (0/4) Epoch 5, batch 50, loss[loss=0.3398, simple_loss=0.3376, pruned_loss=0.1468, over 24103.00 frames. ], tot_loss[loss=0.3685, simple_loss=0.3589, pruned_loss=0.1733, over 1069272.23 frames. ], batch size: 165, lr: 3.64e-02, grad_scale: 8.0
2024-03-09 13:21:32,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4546.666666666667, ans=0.009881159420289855
2024-03-09 13:21:44,554 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=10.91
2024-03-09 13:21:57,639 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=9.23
2024-03-09 13:22:09,206 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.970e+02 2.387e+02 3.231e+02 6.932e+02, threshold=4.775e+02, percent-clipped=2.0
2024-03-09 13:22:24,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4746.666666666667, ans=0.7338666666666667
2024-03-09 13:22:27,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4746.666666666667, ans=0.27749999999999997
2024-03-09 13:22:35,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4813.333333333333, ans=0.7315333333333334
2024-03-09 13:22:45,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4813.333333333333, ans=0.27437500000000004
2024-03-09 13:22:47,535 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.973e+00
2024-03-09 13:22:55,078 INFO [train.py:997] (0/4) Epoch 5, batch 100, loss[loss=0.3652, simple_loss=0.3619, pruned_loss=0.1618, over 24171.00 frames. ], tot_loss[loss=0.3607, simple_loss=0.3537, pruned_loss=0.1657, over 1883421.09 frames. ], batch size: 295, lr: 3.64e-02, grad_scale: 8.0
2024-03-09 13:22:56,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4880.0, ans=0.2512
2024-03-09 13:23:14,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4946.666666666667, ans=0.25053333333333333
2024-03-09 13:23:21,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4946.666666666667, ans=0.268125
2024-03-09 13:23:25,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4946.666666666667, ans=0.2742
2024-03-09 13:23:46,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5080.0, ans=0.26187499999999997
2024-03-09 13:23:51,679 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=11.31
2024-03-09 13:23:52,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5080.0, ans=0.26187499999999997
2024-03-09 13:23:55,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5080.0, ans=0.26187499999999997
2024-03-09 13:24:05,289 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=6.058666666666667
2024-03-09 13:24:11,529 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=11.36
2024-03-09 13:24:19,171 INFO [train.py:997] (0/4) Epoch 5, batch 150, loss[loss=0.3082, simple_loss=0.3134, pruned_loss=0.1242, over 23983.00 frames. ], tot_loss[loss=0.3569, simple_loss=0.3522, pruned_loss=0.1606, over 2528382.17 frames. ], batch size: 142, lr: 3.64e-02, grad_scale: 8.0
2024-03-09 13:24:32,004 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-5.pt
2024-03-09 13:25:15,972 INFO [train.py:997] (0/4) Epoch 6, batch 0, loss[loss=0.3086, simple_loss=0.3176, pruned_loss=0.1181, over 24218.00 frames. ], tot_loss[loss=0.3086, simple_loss=0.3176, pruned_loss=0.1181, over 24218.00 frames. ], batch size: 198, lr: 3.40e-02, grad_scale: 16.0
2024-03-09 13:25:15,973 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:25:26,278 INFO [train.py:1029] (0/4) Epoch 6, validation: loss=0.3173, simple_loss=0.3385, pruned_loss=0.1003, over 452978.00 frames.
2024-03-09 13:25:26,279 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:25:53,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5333.333333333333, ans=0.044444444444444446
2024-03-09 13:25:55,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5333.333333333333, ans=0.25
2024-03-09 13:26:01,735 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.753e+02 2.102e+02 2.732e+02 4.816e+02, threshold=4.205e+02, percent-clipped=1.0
2024-03-09 13:26:02,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5333.333333333333, ans=0.25
2024-03-09 13:26:16,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5400.0, ans=0.246
2024-03-09 13:26:26,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5466.666666666667, ans=0.009681159420289855
2024-03-09 13:26:38,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5533.333333333333, ans=0.7063333333333334
2024-03-09 13:26:56,060 INFO [train.py:997] (0/4) Epoch 6, batch 50, loss[loss=0.2924, simple_loss=0.3065, pruned_loss=0.1054, over 23969.00 frames. ], tot_loss[loss=0.3137, simple_loss=0.3218, pruned_loss=0.1231, over 1071719.59 frames. ], batch size: 142, lr: 3.40e-02, grad_scale: 16.0
2024-03-09 13:26:56,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5600.0, ans=0.2375
2024-03-09 13:27:26,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5733.333333333333, ans=0.24266666666666667
2024-03-09 13:27:34,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5733.333333333333, ans=0.24266666666666667
2024-03-09 13:27:46,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5800.0, ans=0.22812500000000002
2024-03-09 13:27:55,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5800.0, ans=0.22812500000000002
2024-03-09 13:28:06,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5866.666666666667, ans=0.042222222222222223
2024-03-09 13:28:17,513 INFO [train.py:997] (0/4) Epoch 6, batch 100, loss[loss=0.2948, simple_loss=0.3098, pruned_loss=0.1078, over 24268.00 frames. ], tot_loss[loss=0.3142, simple_loss=0.3237, pruned_loss=0.1227, over 1890983.73 frames. ], batch size: 254, lr: 3.40e-02, grad_scale: 8.0
2024-03-09 13:28:26,496 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=9.725
2024-03-09 13:28:37,604 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=12.0
2024-03-09 13:28:45,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=6000.0, ans=0.21875
2024-03-09 13:28:47,226 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.395e+02 1.660e+02 2.447e+02 5.591e+02, threshold=3.319e+02, percent-clipped=4.0
2024-03-09 13:29:17,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=6133.333333333333, ans=0.21250000000000002
2024-03-09 13:29:35,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=6200.0, ans=0.20937499999999998
2024-03-09 13:29:40,034 INFO [train.py:997] (0/4) Epoch 6, batch 150, loss[loss=0.2647, simple_loss=0.2819, pruned_loss=0.09391, over 23806.00 frames. ], tot_loss[loss=0.3088, simple_loss=0.3202, pruned_loss=0.1188, over 2528188.92 frames. ], batch size: 129, lr: 3.39e-02, grad_scale: 8.0
2024-03-09 13:29:52,943 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-6.pt
2024-03-09 13:30:37,223 INFO [train.py:997] (0/4) Epoch 7, batch 0, loss[loss=0.2559, simple_loss=0.2774, pruned_loss=0.08407, over 23593.00 frames. ], tot_loss[loss=0.2559, simple_loss=0.2774, pruned_loss=0.08407, over 23593.00 frames. ], batch size: 128, lr: 3.18e-02, grad_scale: 16.0
2024-03-09 13:30:37,224 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:30:47,284 INFO [train.py:1029] (0/4) Epoch 7, validation: loss=0.2933, simple_loss=0.3253, pruned_loss=0.08566, over 452978.00 frames.
2024-03-09 13:30:47,285 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:31:20,903 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.93 vs. limit=8.193333333333333
2024-03-09 13:32:16,187 INFO [train.py:997] (0/4) Epoch 7, batch 50, loss[loss=0.2581, simple_loss=0.2825, pruned_loss=0.08409, over 24217.00 frames. ], tot_loss[loss=0.2845, simple_loss=0.3038, pruned_loss=0.1016, over 1055468.09 frames. ], batch size: 229, lr: 3.18e-02, grad_scale: 16.0
2024-03-09 13:32:20,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=6653.333333333333, ans=0.188125
2024-03-09 13:32:27,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=6653.333333333333, ans=0.03894444444444445
2024-03-09 13:32:30,835 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.025e+02 1.360e+02 1.605e+02 1.865e+02 3.683e+02, threshold=3.211e+02, percent-clipped=2.0
2024-03-09 13:32:36,652 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.02
2024-03-09 13:32:44,490 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.93 vs. limit=6.68
2024-03-09 13:32:51,691 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 13:33:05,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=6853.333333333333, ans=0.17875000000000002
2024-03-09 13:33:12,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6853.333333333333, ans=0.17875000000000002
2024-03-09 13:33:19,168 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=10.095
2024-03-09 13:33:36,452 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=4.048
2024-03-09 13:33:37,020 INFO [train.py:997] (0/4) Epoch 7, batch 100, loss[loss=0.2914, simple_loss=0.3133, pruned_loss=0.1053, over 24106.00 frames. ], tot_loss[loss=0.2803, simple_loss=0.3019, pruned_loss=0.09807, over 1872137.09 frames. ], batch size: 344, lr: 3.18e-02, grad_scale: 16.0
2024-03-09 13:33:51,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=6986.666666666667, ans=0.6554666666666666
2024-03-09 13:34:06,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7053.333333333333, ans=0.22946666666666665
2024-03-09 13:34:27,765 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=6.796666666666667
2024-03-09 13:34:31,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=7186.666666666667, ans=0.16312500000000002
2024-03-09 13:34:49,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=7253.333333333333, ans=0.15999999999999998
2024-03-09 13:34:55,333 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=10.22
2024-03-09 13:34:58,873 INFO [train.py:997] (0/4) Epoch 7, batch 150, loss[loss=0.2432, simple_loss=0.2734, pruned_loss=0.07554, over 23974.00 frames. ], tot_loss[loss=0.2791, simple_loss=0.3024, pruned_loss=0.09717, over 2506566.13 frames. ], batch size: 142, lr: 3.18e-02, grad_scale: 16.0
2024-03-09 13:35:05,428 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=10.245000000000001
2024-03-09 13:35:11,783 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-7.pt
2024-03-09 13:35:57,455 INFO [train.py:997] (0/4) Epoch 8, batch 0, loss[loss=0.2711, simple_loss=0.3, pruned_loss=0.0905, over 24243.00 frames. ], tot_loss[loss=0.2711, simple_loss=0.3, pruned_loss=0.0905, over 24243.00 frames. ], batch size: 311, lr: 2.99e-02, grad_scale: 32.0
2024-03-09 13:35:57,456 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:36:07,342 INFO [train.py:1029] (0/4) Epoch 8, validation: loss=0.2797, simple_loss=0.3212, pruned_loss=0.07915, over 452978.00 frames.
2024-03-09 13:36:07,343 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:36:08,863 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.023e+02 1.314e+02 1.638e+02 1.955e+02 4.296e+02, threshold=3.277e+02, percent-clipped=3.0
2024-03-09 13:36:38,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=7440.0, ans=9.65
2024-03-09 13:36:56,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=7573.333333333333, ans=0.035111111111111114
2024-03-09 13:37:15,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=7640.0, ans=0.009208695652173913
2024-03-09 13:37:19,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=7640.0, ans=0.009208695652173913
2024-03-09 13:37:28,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=7640.0, ans=0.1
2024-03-09 13:37:28,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=7640.0, ans=0.034833333333333334
2024-03-09 13:37:31,057 INFO [train.py:997] (0/4) Epoch 8, batch 50, loss[loss=0.3296, simple_loss=0.3471, pruned_loss=0.1336, over 23635.00 frames. ], tot_loss[loss=0.264, simple_loss=0.2946, pruned_loss=0.08656, over 1075406.85 frames. ], batch size: 485, lr: 2.99e-02, grad_scale: 32.0
2024-03-09 13:37:39,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=7706.666666666667, ans=0.6302666666666668
2024-03-09 13:37:47,932 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.97 vs. limit=8.886666666666667
2024-03-09 13:37:50,756 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=10.415
2024-03-09 13:37:52,316 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=6.943333333333333
2024-03-09 13:38:39,073 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=10.49
2024-03-09 13:38:51,043 INFO [train.py:997] (0/4) Epoch 8, batch 100, loss[loss=0.2388, simple_loss=0.2735, pruned_loss=0.07445, over 22787.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.2915, pruned_loss=0.08459, over 1880455.62 frames. ], batch size: 85, lr: 2.99e-02, grad_scale: 32.0
2024-03-09 13:38:52,573 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.761e+01 1.115e+02 1.336e+02 1.652e+02 2.844e+02, threshold=2.672e+02, percent-clipped=0.0
2024-03-09 13:38:58,194 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=7.01
2024-03-09 13:38:59,614 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=7.01
2024-03-09 13:39:06,080 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=13.530000000000001
2024-03-09 13:39:14,740 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00
2024-03-09 13:39:16,977 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=13.58
2024-03-09 13:39:48,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=8240.0, ans=0.125
2024-03-09 13:39:56,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8306.666666666666, ans=0.21693333333333334
2024-03-09 13:39:57,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=8306.666666666666, ans=0.6092666666666667
2024-03-09 13:40:00,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8306.666666666666, ans=0.125
2024-03-09 13:40:08,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=8306.666666666666, ans=0.04949747468305833
2024-03-09 13:40:12,947 INFO [train.py:997] (0/4) Epoch 8, batch 150, loss[loss=0.2394, simple_loss=0.2773, pruned_loss=0.07394, over 24264.00 frames. ], tot_loss[loss=0.2576, simple_loss=0.2911, pruned_loss=0.08367, over 2514481.08 frames. ], batch size: 188, lr: 2.99e-02, grad_scale: 16.0
2024-03-09 13:40:25,406 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-8.pt
2024-03-09 13:41:11,732 INFO [train.py:997] (0/4) Epoch 9, batch 0, loss[loss=0.2605, simple_loss=0.2977, pruned_loss=0.0851, over 24156.00 frames. ], tot_loss[loss=0.2605, simple_loss=0.2977, pruned_loss=0.0851, over 24156.00 frames. ], batch size: 366, lr: 2.83e-02, grad_scale: 32.0
2024-03-09 13:41:11,733 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:41:21,825 INFO [train.py:1029] (0/4) Epoch 9, validation: loss=0.2624, simple_loss=0.312, pruned_loss=0.07326, over 452978.00 frames.
2024-03-09 13:41:21,826 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:41:26,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=8426.666666666666, ans=0.125
2024-03-09 13:42:20,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=8626.666666666666, ans=0.16373333333333334
2024-03-09 13:42:41,967 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.295e+01 1.084e+02 1.217e+02 1.477e+02 3.480e+02, threshold=2.433e+02, percent-clipped=5.0
2024-03-09 13:42:49,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=8760.0, ans=0.125
2024-03-09 13:42:51,114 INFO [train.py:997] (0/4) Epoch 9, batch 50, loss[loss=0.2396, simple_loss=0.2865, pruned_loss=0.06842, over 24059.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2806, pruned_loss=0.0732, over 1069856.85 frames. ], batch size: 365, lr: 2.83e-02, grad_scale: 32.0
2024-03-09 13:43:06,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=8826.666666666666, ans=0.5910666666666667
2024-03-09 13:43:10,009 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 13:43:33,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=8893.333333333334, ans=0.008936231884057972
2024-03-09 13:43:39,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=8960.0, ans=0.125
2024-03-09 13:43:52,484 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.15 vs. limit=7.256666666666666
2024-03-09 13:44:03,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=9026.666666666666, ans=0.125
2024-03-09 13:44:05,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9026.666666666666, ans=0.125
2024-03-09 13:44:08,288 INFO [train.py:997] (0/4) Epoch 9, batch 100, loss[loss=0.2258, simple_loss=0.2734, pruned_loss=0.06407, over 23887.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2809, pruned_loss=0.07281, over 1888807.17 frames. ], batch size: 129, lr: 2.83e-02, grad_scale: 32.0
2024-03-09 13:44:17,025 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=14.32
2024-03-09 13:44:21,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=9093.333333333334, ans=0.5817333333333334
2024-03-09 13:44:27,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=9160.0, ans=0.028500000000000004
2024-03-09 13:44:31,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=9160.0, ans=0.028500000000000004
2024-03-09 13:45:00,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=9293.333333333334, ans=0.125
2024-03-09 13:45:05,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=9293.333333333334, ans=0.04949747468305833
2024-03-09 13:45:07,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9293.333333333334, ans=0.125
2024-03-09 13:45:20,413 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.522e+01 1.120e+02 1.341e+02 1.607e+02 2.660e+02, threshold=2.681e+02, percent-clipped=5.0
2024-03-09 13:45:27,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=9360.0, ans=0.02766666666666667
2024-03-09 13:45:30,101 INFO [train.py:997] (0/4) Epoch 9, batch 150, loss[loss=0.2241, simple_loss=0.2709, pruned_loss=0.06654, over 24266.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2821, pruned_loss=0.07342, over 2526261.32 frames. ], batch size: 229, lr: 2.82e-02, grad_scale: 32.0
2024-03-09 13:45:30,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=9426.666666666666, ans=0.027388888888888893
2024-03-09 13:45:42,606 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-9.pt
2024-03-09 13:46:27,250 INFO [train.py:997] (0/4) Epoch 10, batch 0, loss[loss=0.2235, simple_loss=0.2713, pruned_loss=0.06582, over 24276.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2713, pruned_loss=0.06582, over 24276.00 frames. ], batch size: 254, lr: 2.69e-02, grad_scale: 32.0
2024-03-09 13:46:27,251 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:46:37,029 INFO [train.py:1029] (0/4) Epoch 10, validation: loss=0.2538, simple_loss=0.3122, pruned_loss=0.07122, over 452978.00 frames.
2024-03-09 13:46:37,030 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:46:45,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=9480.0, ans=0.125
2024-03-09 13:46:50,615 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=11.055
2024-03-09 13:47:23,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=9613.333333333334, ans=0.125
2024-03-09 13:47:24,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9613.333333333334, ans=0.20386666666666667
2024-03-09 13:47:40,482 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=11.129999999999999
2024-03-09 13:47:53,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=9746.666666666666, ans=0.035
2024-03-09 13:48:02,513 INFO [train.py:997] (0/4) Epoch 10, batch 50, loss[loss=0.2043, simple_loss=0.2603, pruned_loss=0.05221, over 24263.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2784, pruned_loss=0.06936, over 1062874.58 frames. ], batch size: 188, lr: 2.68e-02, grad_scale: 32.0
2024-03-09 13:48:32,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=9946.666666666666, ans=0.20053333333333334
2024-03-09 13:48:37,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=9946.666666666666, ans=0.5518666666666667
2024-03-09 13:48:37,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=9946.666666666666, ans=0.025222222222222226
2024-03-09 13:48:43,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=9946.666666666666, ans=0.125
2024-03-09 13:48:58,432 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.812e+01 1.075e+02 1.246e+02 1.479e+02 2.668e+02, threshold=2.491e+02, percent-clipped=0.0
2024-03-09 13:49:00,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10013.333333333334, ans=0.0
2024-03-09 13:49:21,953 INFO [train.py:997] (0/4) Epoch 10, batch 100, loss[loss=0.2463, simple_loss=0.2959, pruned_loss=0.08028, over 23787.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2761, pruned_loss=0.06721, over 1871919.70 frames. ], batch size: 447, lr: 2.68e-02, grad_scale: 32.0
2024-03-09 13:49:23,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=10146.666666666666, ans=0.125
2024-03-09 13:49:48,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=10213.333333333334, ans=0.125
2024-03-09 13:50:06,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10280.0, ans=0.125
2024-03-09 13:50:29,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=10413.333333333334, ans=0.2
2024-03-09 13:50:33,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=10413.333333333334, ans=0.5355333333333334
2024-03-09 13:50:43,576 INFO [train.py:997] (0/4) Epoch 10, batch 150, loss[loss=0.2148, simple_loss=0.2699, pruned_loss=0.06309, over 23085.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2763, pruned_loss=0.06681, over 2516959.31 frames. ], batch size: 101, lr: 2.68e-02, grad_scale: 32.0
2024-03-09 13:50:55,789 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-10.pt
2024-03-09 13:51:41,272 INFO [train.py:997] (0/4) Epoch 11, batch 0, loss[loss=0.205, simple_loss=0.262, pruned_loss=0.05741, over 24262.00 frames. ], tot_loss[loss=0.205, simple_loss=0.262, pruned_loss=0.05741, over 24262.00 frames. ], batch size: 208, lr: 2.56e-02, grad_scale: 32.0
2024-03-09 13:51:41,273 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:51:51,065 INFO [train.py:1029] (0/4) Epoch 11, validation: loss=0.2397, simple_loss=0.3066, pruned_loss=0.06689, over 452978.00 frames.
2024-03-09 13:51:51,066 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:51:57,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=10533.333333333334, ans=0.125
2024-03-09 13:52:29,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=10666.666666666666, ans=0.19333333333333336
2024-03-09 13:52:38,414 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.688e+01 1.049e+02 1.183e+02 1.464e+02 2.170e+02, threshold=2.365e+02, percent-clipped=0.0
2024-03-09 13:52:41,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=10733.333333333334, ans=0.125
2024-03-09 13:52:46,671 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.55
2024-03-09 13:52:47,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=10733.333333333334, ans=0.125
2024-03-09 13:52:58,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=10733.333333333334, ans=0.125
2024-03-09 13:53:18,218 INFO [train.py:997] (0/4) Epoch 11, batch 50, loss[loss=0.2075, simple_loss=0.2698, pruned_loss=0.05717, over 24078.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2702, pruned_loss=0.0609, over 1066971.02 frames. ], batch size: 344, lr: 2.56e-02, grad_scale: 32.0
2024-03-09 13:53:24,185 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=4.63
2024-03-09 13:53:29,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=10866.666666666666, ans=0.5196666666666667
2024-03-09 13:53:32,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=10933.333333333334, ans=0.02111111111111111
2024-03-09 13:53:34,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10933.333333333334, ans=0.125
2024-03-09 13:54:07,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=11066.666666666666, ans=0.5126666666666667
2024-03-09 13:54:38,693 INFO [train.py:997] (0/4) Epoch 11, batch 100, loss[loss=0.1802, simple_loss=0.2444, pruned_loss=0.04494, over 23777.00 frames. ], tot_loss[loss=0.21, simple_loss=0.27, pruned_loss=0.06026, over 1892829.29 frames. ], batch size: 117, lr: 2.55e-02, grad_scale: 32.0
2024-03-09 13:54:51,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11200.0, ans=0.188
2024-03-09 13:55:14,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11333.333333333334, ans=0.18666666666666665
2024-03-09 13:55:17,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=11333.333333333334, ans=0.125
2024-03-09 13:55:23,737 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.186e+01 9.979e+01 1.131e+02 1.409e+02 2.515e+02, threshold=2.263e+02, percent-clipped=1.0
2024-03-09 13:55:58,234 INFO [train.py:997] (0/4) Epoch 11, batch 150, loss[loss=0.2023, simple_loss=0.272, pruned_loss=0.05463, over 24240.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2715, pruned_loss=0.06134, over 2521132.17 frames. ], batch size: 254, lr: 2.55e-02, grad_scale: 32.0
2024-03-09 13:55:59,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11533.333333333334, ans=0.18466666666666665
2024-03-09 13:56:10,361 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-11.pt
2024-03-09 13:56:55,629 INFO [train.py:997] (0/4) Epoch 12, batch 0, loss[loss=0.1903, simple_loss=0.2616, pruned_loss=0.04806, over 24261.00 frames. ], tot_loss[loss=0.1903, simple_loss=0.2616, pruned_loss=0.04806, over 24261.00 frames. ], batch size: 254, lr: 2.45e-02, grad_scale: 32.0
2024-03-09 13:56:55,630 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 13:57:03,914 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5088, 3.5142, 3.5759, 2.8665], device='cuda:0')
2024-03-09 13:57:05,244 INFO [train.py:1029] (0/4) Epoch 12, validation: loss=0.2325, simple_loss=0.3061, pruned_loss=0.06737, over 452978.00 frames.
2024-03-09 13:57:05,244 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 13:57:27,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=11653.333333333334, ans=0.49213333333333337
2024-03-09 13:57:28,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=11653.333333333334, ans=0.018111111111111106
2024-03-09 13:57:44,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=11720.0, ans=0.125
2024-03-09 13:57:48,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=11720.0, ans=0.125
2024-03-09 13:58:10,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=11786.666666666666, ans=0.4874666666666667
2024-03-09 13:58:17,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=11853.333333333334, ans=0.0
2024-03-09 13:58:28,232 INFO [train.py:997] (0/4) Epoch 12, batch 50, loss[loss=0.1965, simple_loss=0.2713, pruned_loss=0.05139, over 24224.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2653, pruned_loss=0.05485, over 1077039.37 frames. ], batch size: 327, lr: 2.44e-02, grad_scale: 32.0
2024-03-09 13:58:58,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=12053.333333333334, ans=0.125
2024-03-09 13:58:59,612 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 9.982e+01 1.112e+02 1.363e+02 2.435e+02, threshold=2.224e+02, percent-clipped=1.0
2024-03-09 13:59:49,525 INFO [train.py:997] (0/4) Epoch 12, batch 100, loss[loss=0.1862, simple_loss=0.2546, pruned_loss=0.0525, over 24235.00 frames. ], tot_loss[loss=0.1959, simple_loss=0.2653, pruned_loss=0.05464, over 1895207.18 frames. ], batch size: 188, lr: 2.44e-02, grad_scale: 32.0
2024-03-09 14:00:02,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=12253.333333333334, ans=0.38380000000000003
2024-03-09 14:00:11,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12320.0, ans=0.125
2024-03-09 14:00:20,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12386.666666666666, ans=0.17613333333333334
2024-03-09 14:01:06,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=12520.0, ans=0.008147826086956522
2024-03-09 14:01:06,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=12520.0, ans=0.125
2024-03-09 14:01:09,127 INFO [train.py:997] (0/4) Epoch 12, batch 150, loss[loss=0.1658, simple_loss=0.2463, pruned_loss=0.03773, over 21582.00 frames. ], tot_loss[loss=0.196, simple_loss=0.2664, pruned_loss=0.05543, over 2517775.23 frames. ], batch size: 718, lr: 2.44e-02, grad_scale: 32.0
2024-03-09 14:01:21,367 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-12.pt
2024-03-09 14:02:05,581 INFO [train.py:997] (0/4) Epoch 13, batch 0, loss[loss=0.1865, simple_loss=0.2604, pruned_loss=0.05207, over 24264.00 frames. ], tot_loss[loss=0.1865, simple_loss=0.2604, pruned_loss=0.05207, over 24264.00 frames. ], batch size: 241, lr: 2.34e-02, grad_scale: 32.0
2024-03-09 14:02:05,582 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:02:18,484 INFO [train.py:1029] (0/4) Epoch 13, validation: loss=0.2245, simple_loss=0.307, pruned_loss=0.06618, over 452978.00 frames.
2024-03-09 14:02:18,486 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:02:21,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=12640.0, ans=16.98
2024-03-09 14:02:37,308 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.720e+01 1.064e+02 1.199e+02 1.343e+02 2.089e+02, threshold=2.398e+02, percent-clipped=0.0
2024-03-09 14:02:37,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=12706.666666666666, ans=0.013722222222222226
2024-03-09 14:02:51,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=12773.333333333334, ans=0.01344444444444444
2024-03-09 14:03:02,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=12773.333333333334, ans=0.125
2024-03-09 14:03:04,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=12773.333333333334, ans=0.125
2024-03-09 14:03:14,223 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=12.315000000000001
2024-03-09 14:03:15,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=12840.0, ans=0.013166666666666667
2024-03-09 14:03:19,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=12840.0, ans=0.125
2024-03-09 14:03:42,232 INFO [train.py:997] (0/4) Epoch 13, batch 50, loss[loss=0.1762, simple_loss=0.2494, pruned_loss=0.0494, over 24227.00 frames. ], tot_loss[loss=0.1848, simple_loss=0.2605, pruned_loss=0.05125, over 1061752.39 frames. ], batch size: 229, lr: 2.34e-02, grad_scale: 32.0
2024-03-09 14:04:12,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=13106.666666666666, ans=0.125
2024-03-09 14:04:23,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=13106.666666666666, ans=0.125
2024-03-09 14:04:55,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=13240.0, ans=0.43660000000000004
2024-03-09 14:04:57,421 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.59 vs. limit=11.620000000000001
2024-03-09 14:05:04,146 INFO [train.py:997] (0/4) Epoch 13, batch 100, loss[loss=0.1854, simple_loss=0.2647, pruned_loss=0.05286, over 24217.00 frames. ], tot_loss[loss=0.1833, simple_loss=0.2606, pruned_loss=0.05093, over 1878942.86 frames. ], batch size: 295, lr: 2.34e-02, grad_scale: 32.0
2024-03-09 14:05:22,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=13373.333333333334, ans=0.125
2024-03-09 14:05:22,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=13373.333333333334, ans=0.125
2024-03-09 14:05:24,891 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.077e+01 1.017e+02 1.138e+02 1.327e+02 1.773e+02, threshold=2.276e+02, percent-clipped=0.0
2024-03-09 14:05:29,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=13373.333333333334, ans=0.0
2024-03-09 14:05:35,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=13440.0, ans=0.125
2024-03-09 14:05:42,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=13440.0, ans=0.010666666666666672
2024-03-09 14:05:42,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=13440.0, ans=0.125
2024-03-09 14:06:10,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=13573.333333333334, ans=0.42493333333333333
2024-03-09 14:06:12,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=13573.333333333334, ans=0.0
2024-03-09 14:06:13,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13573.333333333334, ans=0.16426666666666667
2024-03-09 14:06:23,139 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=8.393333333333334
2024-03-09 14:06:24,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13640.0, ans=0.125
2024-03-09 14:06:25,353 INFO [train.py:997] (0/4) Epoch 13, batch 150, loss[loss=0.1975, simple_loss=0.2819, pruned_loss=0.05652, over 23794.00 frames. ], tot_loss[loss=0.183, simple_loss=0.2617, pruned_loss=0.05095, over 2509743.01 frames. ], batch size: 447, lr: 2.34e-02, grad_scale: 32.0
2024-03-09 14:06:26,242 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=9.456
2024-03-09 14:06:27,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=13640.0, ans=0.42260000000000003
2024-03-09 14:06:31,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=13640.0, ans=0.125
2024-03-09 14:06:33,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=13640.0, ans=0.125
2024-03-09 14:06:37,593 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-13.pt
2024-03-09 14:07:22,760 INFO [train.py:997] (0/4) Epoch 14, batch 0, loss[loss=0.1767, simple_loss=0.2619, pruned_loss=0.04574, over 24148.00 frames. ], tot_loss[loss=0.1767, simple_loss=0.2619, pruned_loss=0.04574, over 24148.00 frames. ], batch size: 345, lr: 2.25e-02, grad_scale: 32.0
2024-03-09 14:07:22,760 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:07:32,051 INFO [train.py:1029] (0/4) Epoch 14, validation: loss=0.2172, simple_loss=0.3059, pruned_loss=0.06427, over 452978.00 frames.
2024-03-09 14:07:32,052 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:07:46,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=13760.0, ans=0.00933333333333334
2024-03-09 14:08:38,098 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:08:48,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=13960.0, ans=0.007834782608695651
2024-03-09 14:08:51,018 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=9.584
2024-03-09 14:08:53,286 INFO [train.py:997] (0/4) Epoch 14, batch 50, loss[loss=0.1481, simple_loss=0.2405, pruned_loss=0.02789, over 21469.00 frames. ], tot_loss[loss=0.1787, simple_loss=0.2592, pruned_loss=0.04904, over 1071242.12 frames. ], batch size: 714, lr: 2.25e-02, grad_scale: 32.0
2024-03-09 14:08:59,478 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 1.028e+02 1.152e+02 1.303e+02 2.373e+02, threshold=2.304e+02, percent-clipped=1.0
2024-03-09 14:09:23,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=14093.333333333334, ans=0.007805797101449276
2024-03-09 14:09:32,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=14160.0, ans=0.40440000000000004
2024-03-09 14:09:40,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=14226.666666666666, ans=0.125
2024-03-09 14:09:40,652 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=18.17
2024-03-09 14:10:12,224 INFO [train.py:997] (0/4) Epoch 14, batch 100, loss[loss=0.1726, simple_loss=0.2566, pruned_loss=0.04432, over 24220.00 frames. ], tot_loss[loss=0.1776, simple_loss=0.2588, pruned_loss=0.04818, over 1885745.77 frames. ], batch size: 241, lr: 2.25e-02, grad_scale: 32.0
2024-03-09 14:10:40,713 INFO [scaling.py:1023] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=6.8853333333333335
2024-03-09 14:10:47,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=14493.333333333334, ans=0.125
2024-03-09 14:11:08,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=14560.0, ans=0.125
2024-03-09 14:11:22,764 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=9.850666666666665
2024-03-09 14:11:35,856 INFO [train.py:997] (0/4) Epoch 14, batch 150, loss[loss=0.1914, simple_loss=0.2777, pruned_loss=0.05254, over 23993.00 frames. ], tot_loss[loss=0.179, simple_loss=0.2608, pruned_loss=0.04857, over 2514372.92 frames. ], batch size: 388, lr: 2.25e-02, grad_scale: 32.0
2024-03-09 14:11:41,695 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 9.664e+01 1.070e+02 1.194e+02 2.380e+02, threshold=2.140e+02, percent-clipped=1.0
2024-03-09 14:11:47,749 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-14.pt
2024-03-09 14:12:33,830 INFO [train.py:997] (0/4) Epoch 15, batch 0, loss[loss=0.1984, simple_loss=0.283, pruned_loss=0.05691, over 23768.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.283, pruned_loss=0.05691, over 23768.00 frames. ], batch size: 486, lr: 2.17e-02, grad_scale: 32.0
2024-03-09 14:12:33,831 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:12:40,159 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.2567, 2.3136, 2.0556, 2.0763, 2.1980, 2.1469, 2.0836, 2.2145],
device='cuda:0')
2024-03-09 14:12:43,268 INFO [train.py:1029] (0/4) Epoch 15, validation: loss=0.2144, simple_loss=0.3029, pruned_loss=0.06295, over 452978.00 frames.
2024-03-09 14:12:43,269 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:13:01,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14813.333333333334, ans=0.125
2024-03-09 14:13:14,630 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.703333333333333
2024-03-09 14:13:24,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=14880.0, ans=0.3792
2024-03-09 14:14:02,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=15013.333333333334, ans=0.09899494936611666
2024-03-09 14:14:04,899 INFO [train.py:997] (0/4) Epoch 15, batch 50, loss[loss=0.1676, simple_loss=0.2478, pruned_loss=0.04363, over 24107.00 frames. ], tot_loss[loss=0.1756, simple_loss=0.2578, pruned_loss=0.04672, over 1067826.22 frames. ], batch size: 176, lr: 2.17e-02, grad_scale: 32.0
2024-03-09 14:14:08,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=15080.0, ans=0.3722
2024-03-09 14:15:02,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=15280.0, ans=0.125
2024-03-09 14:15:08,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=15346.666666666666, ans=0.125
2024-03-09 14:15:10,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=15346.666666666666, ans=0.125
2024-03-09 14:15:19,050 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.102e+01 1.026e+02 1.164e+02 1.400e+02 2.237e+02, threshold=2.327e+02, percent-clipped=1.0
2024-03-09 14:15:27,120 INFO [train.py:997] (0/4) Epoch 15, batch 100, loss[loss=0.1698, simple_loss=0.2591, pruned_loss=0.04019, over 24259.00 frames. ], tot_loss[loss=0.1748, simple_loss=0.2574, pruned_loss=0.04613, over 1886144.59 frames. ], batch size: 295, lr: 2.17e-02, grad_scale: 32.0
2024-03-09 14:15:30,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=15413.333333333334, ans=0.0024444444444444435
2024-03-09 14:15:54,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=15480.0, ans=0.007504347826086957
2024-03-09 14:15:56,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=15480.0, ans=0.007504347826086957
2024-03-09 14:16:46,435 INFO [train.py:997] (0/4) Epoch 15, batch 150, loss[loss=0.1882, simple_loss=0.2646, pruned_loss=0.05595, over 23887.00 frames. ], tot_loss[loss=0.1737, simple_loss=0.2564, pruned_loss=0.04554, over 2498734.50 frames. ], batch size: 153, lr: 2.16e-02, grad_scale: 32.0
2024-03-09 14:16:58,744 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-15.pt
2024-03-09 14:17:45,381 INFO [train.py:997] (0/4) Epoch 16, batch 0, loss[loss=0.169, simple_loss=0.2606, pruned_loss=0.03866, over 23966.00 frames. ], tot_loss[loss=0.169, simple_loss=0.2606, pruned_loss=0.03866, over 23966.00 frames. ], batch size: 387, lr: 2.09e-02, grad_scale: 32.0
2024-03-09 14:17:45,382 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:17:55,604 INFO [train.py:1029] (0/4) Epoch 16, validation: loss=0.2134, simple_loss=0.3039, pruned_loss=0.06146, over 452978.00 frames.
2024-03-09 14:17:55,604 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:18:08,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15800.0, ans=0.14200000000000002
2024-03-09 14:18:36,826 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=13.475
2024-03-09 14:18:51,994 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=13.5
2024-03-09 14:19:01,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=16000.0, ans=0.0
2024-03-09 14:19:03,237 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.632e+01 1.007e+02 1.180e+02 1.868e+02, threshold=2.014e+02, percent-clipped=0.0
2024-03-09 14:19:21,889 INFO [train.py:997] (0/4) Epoch 16, batch 50, loss[loss=0.1638, simple_loss=0.2484, pruned_loss=0.0396, over 24164.00 frames. ], tot_loss[loss=0.1662, simple_loss=0.2508, pruned_loss=0.04081, over 1074508.98 frames. ], batch size: 217, lr: 2.09e-02, grad_scale: 32.0
2024-03-09 14:19:25,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=16133.333333333334, ans=0.0
2024-03-09 14:19:28,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=16133.333333333334, ans=0.125
2024-03-09 14:19:29,359 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.23 vs. limit=19.6
2024-03-09 14:19:39,059 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:19:39,732 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=5.43
2024-03-09 14:20:05,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=16266.666666666666, ans=0.125
2024-03-09 14:20:26,003 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.50 vs. limit=13.2
2024-03-09 14:20:28,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=16400.0, ans=0.0
2024-03-09 14:20:38,766 INFO [train.py:997] (0/4) Epoch 16, batch 100, loss[loss=0.1717, simple_loss=0.2491, pruned_loss=0.04711, over 24232.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2517, pruned_loss=0.04187, over 1892200.52 frames. ], batch size: 241, lr: 2.09e-02, grad_scale: 32.0
2024-03-09 14:21:16,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=16600.0, ans=0.025
2024-03-09 14:21:36,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16666.666666666668, ans=0.1333333333333333
2024-03-09 14:21:43,561 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.931e+01 9.706e+01 1.091e+02 1.368e+02, threshold=1.941e+02, percent-clipped=0.0
2024-03-09 14:22:02,412 INFO [train.py:997] (0/4) Epoch 16, batch 150, loss[loss=0.1989, simple_loss=0.2811, pruned_loss=0.05834, over 23724.00 frames. ], tot_loss[loss=0.1686, simple_loss=0.2528, pruned_loss=0.04221, over 2520276.84 frames. ], batch size: 486, lr: 2.09e-02, grad_scale: 32.0
2024-03-09 14:22:07,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16800.0, ans=0.132
2024-03-09 14:22:14,613 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-16.pt
2024-03-09 14:23:00,943 INFO [train.py:997] (0/4) Epoch 17, batch 0, loss[loss=0.163, simple_loss=0.2512, pruned_loss=0.03741, over 24204.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2512, pruned_loss=0.03741, over 24204.00 frames. ], batch size: 295, lr: 2.02e-02, grad_scale: 32.0
2024-03-09 14:23:00,943 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:23:11,390 INFO [train.py:1029] (0/4) Epoch 17, validation: loss=0.215, simple_loss=0.3066, pruned_loss=0.06175, over 452978.00 frames.
2024-03-09 14:23:11,391 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:23:27,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=16920.0, ans=0.0
2024-03-09 14:23:49,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16986.666666666668, ans=0.125
2024-03-09 14:23:51,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=16986.666666666668, ans=0.0
2024-03-09 14:24:19,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=17120.0, ans=0.125
2024-03-09 14:24:36,071 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.62 vs. limit=20.39
2024-03-09 14:24:36,337 INFO [train.py:997] (0/4) Epoch 17, batch 50, loss[loss=0.1638, simple_loss=0.2541, pruned_loss=0.03677, over 24265.00 frames. ], tot_loss[loss=0.1667, simple_loss=0.2512, pruned_loss=0.04117, over 1074718.53 frames. ], batch size: 267, lr: 2.02e-02, grad_scale: 32.0
2024-03-09 14:24:59,753 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.830e-03
2024-03-09 14:25:04,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=17253.333333333332, ans=0.125
2024-03-09 14:25:22,879 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 9.326e+01 1.031e+02 1.175e+02 1.521e+02, threshold=2.062e+02, percent-clipped=0.0
2024-03-09 14:25:37,793 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=14.044999999999998
2024-03-09 14:25:49,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=17453.333333333332, ans=0.125
2024-03-09 14:25:56,613 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=20.64
2024-03-09 14:25:57,090 INFO [train.py:997] (0/4) Epoch 17, batch 100, loss[loss=0.1625, simple_loss=0.2498, pruned_loss=0.03755, over 24270.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2516, pruned_loss=0.0411, over 1882719.50 frames. ], batch size: 254, lr: 2.02e-02, grad_scale: 32.0
2024-03-09 14:26:11,731 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=5.638
2024-03-09 14:26:21,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=17586.666666666668, ans=0.09899494936611666
2024-03-09 14:26:21,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=17586.666666666668, ans=0.125
2024-03-09 14:27:15,877 INFO [train.py:997] (0/4) Epoch 17, batch 150, loss[loss=0.1671, simple_loss=0.2548, pruned_loss=0.03968, over 24150.00 frames. ], tot_loss[loss=0.1668, simple_loss=0.2514, pruned_loss=0.04109, over 2517570.99 frames. ], batch size: 345, lr: 2.02e-02, grad_scale: 32.0
2024-03-09 14:27:28,466 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-17.pt
2024-03-09 14:28:12,293 INFO [train.py:997] (0/4) Epoch 18, batch 0, loss[loss=0.1544, simple_loss=0.2397, pruned_loss=0.03458, over 24280.00 frames. ], tot_loss[loss=0.1544, simple_loss=0.2397, pruned_loss=0.03458, over 24280.00 frames. ], batch size: 229, lr: 1.96e-02, grad_scale: 32.0
2024-03-09 14:28:12,294 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:28:22,756 INFO [train.py:1029] (0/4) Epoch 18, validation: loss=0.213, simple_loss=0.3039, pruned_loss=0.06107, over 452978.00 frames.
2024-03-09 14:28:22,756 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:28:29,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=17906.666666666668, ans=0.125
2024-03-09 14:28:32,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=17906.666666666668, ans=0.0
2024-03-09 14:28:46,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=17973.333333333332, ans=0.125
2024-03-09 14:28:47,088 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.11 vs. limit=13.986666666666666
2024-03-09 14:29:02,462 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.00 vs. limit=9.51
2024-03-09 14:29:02,778 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.782e+01 9.645e+01 1.059e+02 1.496e+02, threshold=1.929e+02, percent-clipped=0.0
2024-03-09 14:29:10,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=18040.0, ans=0.125
2024-03-09 14:29:11,444 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=11.216000000000001
2024-03-09 14:29:22,502 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=14.29
2024-03-09 14:29:38,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=18173.333333333332, ans=0.125
2024-03-09 14:29:39,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=18173.333333333332, ans=0.006918840579710145
2024-03-09 14:29:44,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=18240.0, ans=0.125
2024-03-09 14:29:45,631 INFO [train.py:997] (0/4) Epoch 18, batch 50, loss[loss=0.1544, simple_loss=0.2399, pruned_loss=0.03446, over 24260.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2474, pruned_loss=0.03955, over 1069503.57 frames. ], batch size: 198, lr: 1.96e-02, grad_scale: 32.0
2024-03-09 14:30:20,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=18373.333333333332, ans=0.25693333333333346
2024-03-09 14:30:33,256 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=14.415
2024-03-09 14:30:37,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=18440.0, ans=0.0
2024-03-09 14:30:41,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=18440.0, ans=0.0
2024-03-09 14:31:00,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=18506.666666666668, ans=0.006846376811594203
2024-03-09 14:31:06,274 INFO [train.py:997] (0/4) Epoch 18, batch 100, loss[loss=0.163, simple_loss=0.249, pruned_loss=0.03848, over 24302.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2477, pruned_loss=0.03865, over 1882939.63 frames. ], batch size: 241, lr: 1.96e-02, grad_scale: 32.0
2024-03-09 14:31:15,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=18573.333333333332, ans=0.06426666666666667
2024-03-09 14:31:39,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=18706.666666666668, ans=0.125
2024-03-09 14:31:41,829 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.645e+01 9.593e+01 1.057e+02 1.559e+02, threshold=1.919e+02, percent-clipped=0.0
2024-03-09 14:31:49,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=18706.666666666668, ans=0.125
2024-03-09 14:31:52,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=18706.666666666668, ans=0.0
2024-03-09 14:31:56,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18773.333333333332, ans=0.11226666666666668
2024-03-09 14:32:09,675 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=5.8260000000000005
2024-03-09 14:32:26,168 INFO [train.py:997] (0/4) Epoch 18, batch 150, loss[loss=0.1676, simple_loss=0.2505, pruned_loss=0.04236, over 24078.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2492, pruned_loss=0.03919, over 2521185.23 frames. ], batch size: 165, lr: 1.95e-02, grad_scale: 32.0
2024-03-09 14:32:38,333 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-18.pt
2024-03-09 14:33:23,372 INFO [train.py:997] (0/4) Epoch 19, batch 0, loss[loss=0.1729, simple_loss=0.264, pruned_loss=0.04085, over 24022.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.264, pruned_loss=0.04085, over 24022.00 frames. ], batch size: 416, lr: 1.90e-02, grad_scale: 32.0
2024-03-09 14:33:23,373 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:33:30,781 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.4285, 5.0987, 5.3784, 5.1244], device='cuda:0')
2024-03-09 14:33:35,286 INFO [train.py:1029] (0/4) Epoch 19, validation: loss=0.2133, simple_loss=0.3046, pruned_loss=0.061, over 452978.00 frames.
2024-03-09 14:33:35,287 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:33:38,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=18960.0, ans=0.024620000000000003
2024-03-09 14:34:06,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=19026.666666666668, ans=0.125
2024-03-09 14:34:31,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19160.0, ans=0.10840000000000002
2024-03-09 14:34:35,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=19160.0, ans=0.125
2024-03-09 14:34:46,993 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.30 vs. limit=14.613333333333335
2024-03-09 14:34:51,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=19226.666666666668, ans=0.006689855072463767
2024-03-09 14:34:53,443 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=14.71
2024-03-09 14:34:55,506 INFO [train.py:997] (0/4) Epoch 19, batch 50, loss[loss=0.165, simple_loss=0.2532, pruned_loss=0.03836, over 24187.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2465, pruned_loss=0.03701, over 1071248.87 frames. ], batch size: 295, lr: 1.90e-02, grad_scale: 32.0
2024-03-09 14:34:55,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19293.333333333332, ans=0.1070666666666667
2024-03-09 14:35:08,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=19293.333333333332, ans=0.00667536231884058
2024-03-09 14:35:08,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19293.333333333332, ans=0.0
2024-03-09 14:35:17,297 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.675e+01 9.444e+01 1.046e+02 1.924e+02, threshold=1.889e+02, percent-clipped=1.0
2024-03-09 14:35:42,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=19493.333333333332, ans=0.49239999999999995
2024-03-09 14:35:47,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=19493.333333333332, ans=0.05
2024-03-09 14:35:50,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=19493.333333333332, ans=0.0
2024-03-09 14:35:58,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=19560.0, ans=5.934
2024-03-09 14:36:16,165 INFO [train.py:997] (0/4) Epoch 19, batch 100, loss[loss=0.1576, simple_loss=0.2467, pruned_loss=0.03418, over 24199.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2474, pruned_loss=0.03783, over 1882403.70 frames. ], batch size: 280, lr: 1.90e-02, grad_scale: 32.0
2024-03-09 14:36:21,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19626.666666666668, ans=0.10373333333333334
2024-03-09 14:36:27,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=19626.666666666668, ans=0.125
2024-03-09 14:36:32,737 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=14.86
2024-03-09 14:36:36,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=19693.333333333332, ans=0.21073333333333344
2024-03-09 14:36:39,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=19693.333333333332, ans=0.0
2024-03-09 14:36:43,199 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=9.923333333333332
2024-03-09 14:36:53,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19760.0, ans=0.10240000000000002
2024-03-09 14:36:54,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19760.0, ans=0.10240000000000002
2024-03-09 14:37:14,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=19826.666666666668, ans=0.006559420289855072
2024-03-09 14:37:27,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19893.333333333332, ans=0.0
2024-03-09 14:37:36,535 INFO [train.py:997] (0/4) Epoch 19, batch 150, loss[loss=0.2038, simple_loss=0.281, pruned_loss=0.06329, over 23262.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2488, pruned_loss=0.03824, over 2517229.94 frames. ], batch size: 534, lr: 1.89e-02, grad_scale: 32.0
2024-03-09 14:37:43,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=19960.0, ans=0.125
2024-03-09 14:37:49,377 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-19.pt
2024-03-09 14:38:31,428 INFO [train.py:997] (0/4) Epoch 20, batch 0, loss[loss=0.1662, simple_loss=0.2468, pruned_loss=0.04281, over 24051.00 frames. ], tot_loss[loss=0.1662, simple_loss=0.2468, pruned_loss=0.04281, over 24051.00 frames. ], batch size: 176, lr: 1.85e-02, grad_scale: 32.0
2024-03-09 14:38:31,429 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:38:38,128 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1197, 3.9310, 3.8895, 3.4368], device='cuda:0')
2024-03-09 14:38:40,964 INFO [train.py:1029] (0/4) Epoch 20, validation: loss=0.2111, simple_loss=0.3031, pruned_loss=0.05952, over 452978.00 frames.
2024-03-09 14:38:40,964 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:38:53,196 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.448e+01 9.307e+01 1.038e+02 2.078e+02, threshold=1.861e+02, percent-clipped=1.0
2024-03-09 14:38:53,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20013.333333333332, ans=0.1
2024-03-09 14:39:37,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=20213.333333333332, ans=0.125
2024-03-09 14:39:57,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=20280.0, ans=0.05
2024-03-09 14:39:59,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=20280.0, ans=0.0
2024-03-09 14:40:03,546 INFO [train.py:997] (0/4) Epoch 20, batch 50, loss[loss=0.1468, simple_loss=0.2296, pruned_loss=0.03198, over 23582.00 frames. ], tot_loss[loss=0.1552, simple_loss=0.241, pruned_loss=0.0347, over 1076970.64 frames. ], batch size: 128, lr: 1.84e-02, grad_scale: 32.0
2024-03-09 14:41:11,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=20613.333333333332, ans=0.09899494936611666
2024-03-09 14:41:25,658 INFO [train.py:997] (0/4) Epoch 20, batch 100, loss[loss=0.1671, simple_loss=0.2498, pruned_loss=0.04219, over 24104.00 frames. ], tot_loss[loss=0.1591, simple_loss=0.2448, pruned_loss=0.03668, over 1894606.49 frames. ], batch size: 165, lr: 1.84e-02, grad_scale: 32.0
2024-03-09 14:41:34,816 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.010e+01 8.832e+01 9.695e+01 1.353e+02, threshold=1.766e+02, percent-clipped=0.0
2024-03-09 14:41:40,236 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0
2024-03-09 14:41:49,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=20746.666666666668, ans=0.2
2024-03-09 14:42:04,612 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0
2024-03-09 14:42:04,942 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0
2024-03-09 14:42:07,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=20813.333333333332, ans=0.07
2024-03-09 14:42:36,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20946.666666666668, ans=0.0
2024-03-09 14:42:44,020 INFO [train.py:997] (0/4) Epoch 20, batch 150, loss[loss=0.1465, simple_loss=0.232, pruned_loss=0.03051, over 24252.00 frames. ], tot_loss[loss=0.1585, simple_loss=0.2445, pruned_loss=0.03627, over 2518931.36 frames. ], batch size: 229, lr: 1.84e-02, grad_scale: 32.0
2024-03-09 14:42:52,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=21013.333333333332, ans=0.125
2024-03-09 14:42:53,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=21013.333333333332, ans=0.006301449275362319
2024-03-09 14:42:56,093 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-20.pt
2024-03-09 14:43:39,749 INFO [train.py:997] (0/4) Epoch 21, batch 0, loss[loss=0.1557, simple_loss=0.2395, pruned_loss=0.03593, over 22561.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.2395, pruned_loss=0.03593, over 22561.00 frames. ], batch size: 85, lr: 1.79e-02, grad_scale: 32.0
2024-03-09 14:43:39,750 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:43:49,466 INFO [train.py:1029] (0/4) Epoch 21, validation: loss=0.2106, simple_loss=0.3015, pruned_loss=0.05984, over 452978.00 frames.
2024-03-09 14:43:49,467 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:44:14,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=21133.333333333332, ans=0.2
2024-03-09 14:44:25,168 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.598e-02
2024-03-09 14:44:26,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=21200.0, ans=0.125
2024-03-09 14:44:41,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=21266.666666666668, ans=0.125
2024-03-09 14:45:10,283 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.134e+01 8.236e+01 9.284e+01 1.075e+02 1.651e+02, threshold=1.857e+02, percent-clipped=0.0
2024-03-09 14:45:13,759 INFO [train.py:997] (0/4) Epoch 21, batch 50, loss[loss=0.1334, simple_loss=0.2286, pruned_loss=0.01914, over 21496.00 frames. ], tot_loss[loss=0.1576, simple_loss=0.2452, pruned_loss=0.03499, over 1066137.56 frames. ], batch size: 717, lr: 1.79e-02, grad_scale: 32.0
2024-03-09 14:45:18,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=21400.0, ans=0.125
2024-03-09 14:45:20,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=21400.0, ans=0.1
2024-03-09 14:45:23,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21400.0, ans=0.1
2024-03-09 14:45:37,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=21466.666666666668, ans=0.0
2024-03-09 14:45:57,078 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0
2024-03-09 14:46:00,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=21600.0, ans=0.2
2024-03-09 14:46:29,323 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0
2024-03-09 14:46:32,881 INFO [train.py:997] (0/4) Epoch 21, batch 100, loss[loss=0.1936, simple_loss=0.2731, pruned_loss=0.05707, over 23234.00 frames. ], tot_loss[loss=0.1586, simple_loss=0.2461, pruned_loss=0.03551, over 1889581.54 frames. ], batch size: 534, lr: 1.79e-02, grad_scale: 64.0
2024-03-09 14:47:29,119 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=22.5
2024-03-09 14:47:35,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=21933.333333333332, ans=0.125
2024-03-09 14:47:48,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=22000.0, ans=0.95
2024-03-09 14:47:51,969 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.144e+01 8.919e+01 1.026e+02 1.301e+02, threshold=1.784e+02, percent-clipped=0.0
2024-03-09 14:47:55,065 INFO [train.py:997] (0/4) Epoch 21, batch 150, loss[loss=0.133, simple_loss=0.2276, pruned_loss=0.0192, over 21551.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2472, pruned_loss=0.03658, over 2523300.87 frames. ], batch size: 718, lr: 1.79e-02, grad_scale: 64.0
2024-03-09 14:48:07,327 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-21.pt
2024-03-09 14:48:50,783 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=12.0
2024-03-09 14:48:51,238 INFO [train.py:997] (0/4) Epoch 22, batch 0, loss[loss=0.1625, simple_loss=0.2543, pruned_loss=0.03538, over 24015.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2543, pruned_loss=0.03538, over 24015.00 frames. ], batch size: 416, lr: 1.74e-02, grad_scale: 64.0
2024-03-09 14:48:51,239 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:49:00,964 INFO [train.py:1029] (0/4) Epoch 22, validation: loss=0.2117, simple_loss=0.3028, pruned_loss=0.06033, over 452978.00 frames.
2024-03-09 14:49:00,965 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:49:01,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22120.0, ans=0.1
2024-03-09 14:49:12,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=22120.0, ans=0.2
2024-03-09 14:49:29,886 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0
2024-03-09 14:49:56,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=22320.0, ans=0.2
2024-03-09 14:49:57,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22320.0, ans=0.1
2024-03-09 14:50:07,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=22386.666666666668, ans=0.0
2024-03-09 14:50:23,732 INFO [train.py:997] (0/4) Epoch 22, batch 50, loss[loss=0.1616, simple_loss=0.2404, pruned_loss=0.04134, over 23927.00 frames. ], tot_loss[loss=0.1542, simple_loss=0.2416, pruned_loss=0.0334, over 1068791.91 frames. ], batch size: 153, lr: 1.74e-02, grad_scale: 64.0
2024-03-09 14:50:33,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=22453.333333333332, ans=0.07
2024-03-09 14:50:38,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=22520.0, ans=0.0
2024-03-09 14:51:04,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=22586.666666666668, ans=0.0
2024-03-09 14:51:05,295 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0
2024-03-09 14:51:18,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=22653.333333333332, ans=0.125
2024-03-09 14:51:28,016 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.132e+01 8.918e+01 9.986e+01 1.265e+02, threshold=1.784e+02, percent-clipped=0.0
2024-03-09 14:51:45,175 INFO [train.py:997] (0/4) Epoch 22, batch 100, loss[loss=0.1544, simple_loss=0.2491, pruned_loss=0.02986, over 24063.00 frames. ], tot_loss[loss=0.1541, simple_loss=0.2412, pruned_loss=0.03344, over 1880311.48 frames. ], batch size: 365, lr: 1.74e-02, grad_scale: 64.0
2024-03-09 14:51:48,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=22786.666666666668, ans=0.125
2024-03-09 14:51:49,463 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0
2024-03-09 14:52:05,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=22853.333333333332, ans=0.005901449275362319
2024-03-09 14:52:20,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=22920.0, ans=0.1
2024-03-09 14:52:28,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=22920.0, ans=0.125
2024-03-09 14:52:29,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=22920.0, ans=0.125
2024-03-09 14:52:37,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=22986.666666666668, ans=0.0
2024-03-09 14:52:52,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=23053.333333333332, ans=0.0
2024-03-09 14:52:52,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=23053.333333333332, ans=0.5
2024-03-09 14:53:00,363 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0
2024-03-09 14:53:01,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=23053.333333333332, ans=0.005857971014492754
2024-03-09 14:53:05,950 INFO [train.py:997] (0/4) Epoch 22, batch 150, loss[loss=0.1542, simple_loss=0.2448, pruned_loss=0.03185, over 24194.00 frames. ], tot_loss[loss=0.1545, simple_loss=0.2424, pruned_loss=0.03335, over 2516578.38 frames. ], batch size: 241, lr: 1.74e-02, grad_scale: 64.0
2024-03-09 14:53:15,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=23120.0, ans=0.125
2024-03-09 14:53:18,515 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-22.pt
2024-03-09 14:54:00,138 INFO [train.py:997] (0/4) Epoch 23, batch 0, loss[loss=0.1547, simple_loss=0.2325, pruned_loss=0.03842, over 20296.00 frames. ], tot_loss[loss=0.1547, simple_loss=0.2325, pruned_loss=0.03842, over 20296.00 frames. ], batch size: 60, lr: 1.70e-02, grad_scale: 64.0
2024-03-09 14:54:00,139 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:54:09,892 INFO [train.py:1029] (0/4) Epoch 23, validation: loss=0.2115, simple_loss=0.3036, pruned_loss=0.0597, over 452978.00 frames.
2024-03-09 14:54:09,893 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:55:00,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=23373.333333333332, ans=0.125
2024-03-09 14:55:05,176 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.526e+01 7.783e+01 8.704e+01 9.596e+01 1.275e+02, threshold=1.741e+02, percent-clipped=0.0
2024-03-09 14:55:07,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=23373.333333333332, ans=0.125
2024-03-09 14:55:33,121 INFO [train.py:997] (0/4) Epoch 23, batch 50, loss[loss=0.1242, simple_loss=0.2224, pruned_loss=0.01296, over 21644.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.2378, pruned_loss=0.03189, over 1055970.27 frames. ], batch size: 718, lr: 1.70e-02, grad_scale: 64.0
2024-03-09 14:55:35,060 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:55:50,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=23573.333333333332, ans=0.1
2024-03-09 14:55:53,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=23573.333333333332, ans=0.125
2024-03-09 14:56:01,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=23573.333333333332, ans=0.0
2024-03-09 14:56:07,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=23640.0, ans=0.0
2024-03-09 14:56:09,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=23640.0, ans=0.05
2024-03-09 14:56:51,640 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0
2024-03-09 14:56:53,727 INFO [train.py:997] (0/4) Epoch 23, batch 100, loss[loss=0.1374, simple_loss=0.227, pruned_loss=0.02393, over 23991.00 frames. ], tot_loss[loss=0.1535, simple_loss=0.2411, pruned_loss=0.03292, over 1873457.87 frames. ], batch size: 142, lr: 1.69e-02, grad_scale: 64.0
2024-03-09 14:57:00,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=23840.0, ans=0.125
2024-03-09 14:57:02,393 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0
2024-03-09 14:57:09,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=23906.666666666668, ans=0.125
2024-03-09 14:57:45,468 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.240e+01 7.813e+01 8.574e+01 9.589e+01 1.326e+02, threshold=1.715e+02, percent-clipped=0.0
2024-03-09 14:58:13,656 INFO [train.py:997] (0/4) Epoch 23, batch 150, loss[loss=0.1579, simple_loss=0.2442, pruned_loss=0.03576, over 24253.00 frames. ], tot_loss[loss=0.1534, simple_loss=0.2413, pruned_loss=0.03276, over 2510509.13 frames. ], batch size: 198, lr: 1.69e-02, grad_scale: 64.0
2024-03-09 14:58:25,925 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-23.pt
2024-03-09 14:59:06,778 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5
2024-03-09 14:59:07,185 INFO [train.py:997] (0/4) Epoch 24, batch 0, loss[loss=0.143, simple_loss=0.2308, pruned_loss=0.02763, over 20364.00 frames. ], tot_loss[loss=0.143, simple_loss=0.2308, pruned_loss=0.02763, over 20364.00 frames. ], batch size: 60, lr: 1.66e-02, grad_scale: 64.0
2024-03-09 14:59:07,185 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 14:59:16,706 INFO [train.py:1029] (0/4) Epoch 24, validation: loss=0.2123, simple_loss=0.3043, pruned_loss=0.06014, over 452978.00 frames.
2024-03-09 14:59:16,707 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 14:59:49,533 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:59:51,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=24293.333333333332, ans=0.005588405797101449
2024-03-09 14:59:54,097 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 14:59:54,866 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0
2024-03-09 15:00:26,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=24493.333333333332, ans=0.005544927536231884
2024-03-09 15:00:43,104 INFO [train.py:997] (0/4) Epoch 24, batch 50, loss[loss=0.1556, simple_loss=0.245, pruned_loss=0.03309, over 24205.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.2378, pruned_loss=0.03194, over 1073196.93 frames. ], batch size: 295, lr: 1.65e-02, grad_scale: 64.0
2024-03-09 15:00:48,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=24560.0, ans=0.125
2024-03-09 15:00:52,062 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.04 vs. limit=10.0
2024-03-09 15:01:20,106 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.514e+01 7.866e+01 8.423e+01 9.105e+01 1.243e+02, threshold=1.685e+02, percent-clipped=0.0
2024-03-09 15:01:34,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=24760.0, ans=0.2
2024-03-09 15:02:03,700 INFO [train.py:997] (0/4) Epoch 24, batch 100, loss[loss=0.1498, simple_loss=0.2407, pruned_loss=0.02947, over 24119.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.2396, pruned_loss=0.03236, over 1880690.09 frames. ], batch size: 345, lr: 1.65e-02, grad_scale: 64.0
2024-03-09 15:02:12,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=24893.333333333332, ans=0.04949747468305833
2024-03-09 15:02:31,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=24960.0, ans=0.0
2024-03-09 15:02:48,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=25026.666666666668, ans=0.125
2024-03-09 15:02:55,393 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5
2024-03-09 15:02:56,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=25093.333333333332, ans=0.2
2024-03-09 15:03:24,739 INFO [train.py:997] (0/4) Epoch 24, batch 150, loss[loss=0.1926, simple_loss=0.2715, pruned_loss=0.05689, over 23321.00 frames. ], tot_loss[loss=0.1531, simple_loss=0.2406, pruned_loss=0.03286, over 2517082.11 frames. ], batch size: 534, lr: 1.65e-02, grad_scale: 64.0
2024-03-09 15:03:36,211 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-24.pt
2024-03-09 15:04:17,871 INFO [train.py:997] (0/4) Epoch 25, batch 0, loss[loss=0.1555, simple_loss=0.2383, pruned_loss=0.03632, over 23989.00 frames. ], tot_loss[loss=0.1555, simple_loss=0.2383, pruned_loss=0.03632, over 23989.00 frames. ], batch size: 165, lr: 1.61e-02, grad_scale: 64.0
2024-03-09 15:04:17,872 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:04:27,731 INFO [train.py:1029] (0/4) Epoch 25, validation: loss=0.2123, simple_loss=0.3048, pruned_loss=0.05995, over 452978.00 frames.
2024-03-09 15:04:27,732 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:04:56,153 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.291e+01 7.825e+01 8.498e+01 9.317e+01 1.197e+02, threshold=1.700e+02, percent-clipped=0.0
2024-03-09 15:05:24,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25480.0, ans=0.0
2024-03-09 15:05:31,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25546.666666666668, ans=0.1
2024-03-09 15:05:35,922 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0
2024-03-09 15:05:38,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=25546.666666666668, ans=0.0
2024-03-09 15:05:45,983 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0
2024-03-09 15:05:50,869 INFO [train.py:997] (0/4) Epoch 25, batch 50, loss[loss=0.1952, simple_loss=0.2743, pruned_loss=0.05803, over 23275.00 frames. ], tot_loss[loss=0.1525, simple_loss=0.2401, pruned_loss=0.0325, over 1057002.68 frames. ], batch size: 534, lr: 1.61e-02, grad_scale: 64.0
2024-03-09 15:06:08,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=25680.0, ans=15.0
2024-03-09 15:06:16,559 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0
2024-03-09 15:07:06,212 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0
2024-03-09 15:07:11,204 INFO [train.py:997] (0/4) Epoch 25, batch 100, loss[loss=0.1584, simple_loss=0.2518, pruned_loss=0.03244, over 23996.00 frames. ], tot_loss[loss=0.1517, simple_loss=0.2394, pruned_loss=0.03203, over 1879158.15 frames. ], batch size: 416, lr: 1.61e-02, grad_scale: 64.0
2024-03-09 15:07:22,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=25946.666666666668, ans=0.125
2024-03-09 15:07:37,664 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.239e+01 7.935e+01 8.679e+01 9.503e+01 1.168e+02, threshold=1.736e+02, percent-clipped=0.0
2024-03-09 15:07:50,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=26080.0, ans=0.0
2024-03-09 15:07:58,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=26146.666666666668, ans=0.07
2024-03-09 15:08:31,506 INFO [train.py:997] (0/4) Epoch 25, batch 150, loss[loss=0.1308, simple_loss=0.2132, pruned_loss=0.02421, over 23677.00 frames. ], tot_loss[loss=0.1503, simple_loss=0.2382, pruned_loss=0.03123, over 2512466.32 frames. ], batch size: 116, lr: 1.61e-02, grad_scale: 64.0
2024-03-09 15:08:43,626 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-25.pt
2024-03-09 15:09:26,510 INFO [train.py:997] (0/4) Epoch 26, batch 0, loss[loss=0.148, simple_loss=0.2348, pruned_loss=0.03061, over 24281.00 frames. ], tot_loss[loss=0.148, simple_loss=0.2348, pruned_loss=0.03061, over 24281.00 frames. ], batch size: 281, lr: 1.58e-02, grad_scale: 64.0
2024-03-09 15:09:26,510 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:09:35,915 INFO [train.py:1029] (0/4) Epoch 26, validation: loss=0.2091, simple_loss=0.3013, pruned_loss=0.05842, over 452978.00 frames.
2024-03-09 15:09:35,915 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:09:51,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=26333.333333333332, ans=0.05
2024-03-09 15:09:59,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=26400.0, ans=0.005130434782608696
2024-03-09 15:10:04,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=26400.0, ans=0.125
2024-03-09 15:10:18,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=26466.666666666668, ans=0.125
2024-03-09 15:10:46,303 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.24 vs. limit=10.0
2024-03-09 15:10:55,261 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/checkpoint-4000.pt
2024-03-09 15:10:59,541 INFO [train.py:997] (0/4) Epoch 26, batch 50, loss[loss=0.1583, simple_loss=0.2526, pruned_loss=0.032, over 24015.00 frames. ], tot_loss[loss=0.1477, simple_loss=0.2356, pruned_loss=0.02994, over 1071984.71 frames. ], batch size: 388, lr: 1.57e-02, grad_scale: 64.0
2024-03-09 15:11:01,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=26666.666666666668, ans=0.125
2024-03-09 15:11:09,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=26666.666666666668, ans=0.125
2024-03-09 15:11:11,922 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.387e+01 7.632e+01 8.183e+01 8.952e+01 1.265e+02, threshold=1.637e+02, percent-clipped=0.0
2024-03-09 15:12:19,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=26933.333333333332, ans=0.005014492753623189
2024-03-09 15:12:20,295 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0
2024-03-09 15:12:21,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=27000.0, ans=0.125
2024-03-09 15:12:22,491 INFO [train.py:997] (0/4) Epoch 26, batch 100, loss[loss=0.1537, simple_loss=0.2503, pruned_loss=0.02859, over 24072.00 frames. ], tot_loss[loss=0.1487, simple_loss=0.2372, pruned_loss=0.03011, over 1874992.90 frames. ], batch size: 416, lr: 1.57e-02, grad_scale: 64.0
2024-03-09 15:13:02,494 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0
2024-03-09 15:13:03,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=27133.333333333332, ans=0.125
2024-03-09 15:13:20,909 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=12.0
2024-03-09 15:13:26,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27266.666666666668, ans=0.1
2024-03-09 15:13:41,515 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0
2024-03-09 15:13:42,259 INFO [train.py:997] (0/4) Epoch 26, batch 150, loss[loss=0.1536, simple_loss=0.2487, pruned_loss=0.02921, over 23951.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.2381, pruned_loss=0.0303, over 2521242.32 frames. ], batch size: 416, lr: 1.57e-02, grad_scale: 64.0
2024-03-09 15:13:49,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=27333.333333333332, ans=0.125
2024-03-09 15:13:55,059 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-26.pt
2024-03-09 15:14:38,730 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.880e+01 7.565e+01 8.210e+01 9.162e+01 1.256e+02, threshold=1.642e+02, percent-clipped=0.0
2024-03-09 15:14:38,763 INFO [train.py:997] (0/4) Epoch 27, batch 0, loss[loss=0.1575, simple_loss=0.2397, pruned_loss=0.03766, over 23932.00 frames. ], tot_loss[loss=0.1575, simple_loss=0.2397, pruned_loss=0.03766, over 23932.00 frames. ], batch size: 153, lr: 1.54e-02, grad_scale: 64.0
2024-03-09 15:14:38,763 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:14:48,406 INFO [train.py:1029] (0/4) Epoch 27, validation: loss=0.2114, simple_loss=0.3031, pruned_loss=0.05987, over 452978.00 frames.
2024-03-09 15:14:48,406 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:15:39,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=27520.0, ans=0.125
2024-03-09 15:15:51,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=27586.666666666668, ans=0.0
2024-03-09 15:15:54,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=27586.666666666668, ans=0.2
2024-03-09 15:16:04,701 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0
2024-03-09 15:16:14,483 INFO [train.py:997] (0/4) Epoch 27, batch 50, loss[loss=0.1505, simple_loss=0.2362, pruned_loss=0.03237, over 24217.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.2384, pruned_loss=0.03242, over 1078106.10 frames. ], batch size: 241, lr: 1.54e-02, grad_scale: 64.0
2024-03-09 15:16:17,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=27720.0, ans=0.125
2024-03-09 15:16:24,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=27720.0, ans=0.125
2024-03-09 15:16:25,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=27720.0, ans=0.07
2024-03-09 15:16:39,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=27786.666666666668, ans=0.004828985507246377
2024-03-09 15:16:50,030 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.75 vs. limit=10.0
2024-03-09 15:17:06,124 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 15:17:33,762 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.709e+01 7.734e+01 8.550e+01 9.615e+01 1.355e+02, threshold=1.710e+02, percent-clipped=0.0
2024-03-09 15:17:33,799 INFO [train.py:997] (0/4) Epoch 27, batch 100, loss[loss=0.1466, simple_loss=0.2324, pruned_loss=0.03042, over 23679.00 frames. ], tot_loss[loss=0.1491, simple_loss=0.2369, pruned_loss=0.03063, over 1897391.58 frames. ], batch size: 129, lr: 1.53e-02, grad_scale: 64.0
2024-03-09 15:17:44,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=28053.333333333332, ans=0.125
2024-03-09 15:18:03,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=28120.0, ans=0.025
2024-03-09 15:18:24,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=28253.333333333332, ans=0.0
2024-03-09 15:18:35,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=28253.333333333332, ans=0.2
2024-03-09 15:18:50,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=28320.0, ans=0.125
2024-03-09 15:18:54,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=28386.666666666668, ans=0.125
2024-03-09 15:18:55,715 INFO [train.py:997] (0/4) Epoch 27, batch 150, loss[loss=0.1433, simple_loss=0.2289, pruned_loss=0.02883, over 23245.00 frames. ], tot_loss[loss=0.1495, simple_loss=0.2383, pruned_loss=0.03039, over 2524252.50 frames. ], batch size: 102, lr: 1.53e-02, grad_scale: 64.0
2024-03-09 15:19:08,946 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-27.pt
2024-03-09 15:19:49,041 INFO [train.py:997] (0/4) Epoch 28, batch 0, loss[loss=0.1464, simple_loss=0.2333, pruned_loss=0.02969, over 24248.00 frames. ], tot_loss[loss=0.1464, simple_loss=0.2333, pruned_loss=0.02969, over 24248.00 frames. ], batch size: 188, lr: 1.50e-02, grad_scale: 64.0
2024-03-09 15:19:49,042 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:19:59,332 INFO [train.py:1029] (0/4) Epoch 28, validation: loss=0.2107, simple_loss=0.3034, pruned_loss=0.05903, over 452978.00 frames.
2024-03-09 15:19:59,333 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:20:54,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=28640.0, ans=0.2
2024-03-09 15:21:11,008 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.458e+01 7.529e+01 8.136e+01 8.999e+01 1.198e+02, threshold=1.627e+02, percent-clipped=0.0
2024-03-09 15:21:23,147 INFO [train.py:997] (0/4) Epoch 28, batch 50, loss[loss=0.1528, simple_loss=0.238, pruned_loss=0.03385, over 24062.00 frames. ], tot_loss[loss=0.1489, simple_loss=0.2361, pruned_loss=0.03084, over 1059671.48 frames. ], batch size: 176, lr: 1.50e-02, grad_scale: 64.0
2024-03-09 15:21:27,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28773.333333333332, ans=0.1
2024-03-09 15:21:56,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=28906.666666666668, ans=0.004585507246376811
2024-03-09 15:22:02,501 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0
2024-03-09 15:22:23,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=28973.333333333332, ans=0.125
2024-03-09 15:22:25,560 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5
2024-03-09 15:22:43,071 INFO [train.py:997] (0/4) Epoch 28, batch 100, loss[loss=0.1415, simple_loss=0.2314, pruned_loss=0.02578, over 23388.00 frames. ], tot_loss[loss=0.1471, simple_loss=0.2356, pruned_loss=0.02928, over 1873994.08 frames. ], batch size: 102, lr: 1.50e-02, grad_scale: 64.0
2024-03-09 15:23:34,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=29306.666666666668, ans=0.004498550724637681
2024-03-09 15:23:50,078 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.379e+01 7.431e+01 8.104e+01 8.725e+01 1.109e+02, threshold=1.621e+02, percent-clipped=0.0
2024-03-09 15:24:02,913 INFO [train.py:997] (0/4) Epoch 28, batch 150, loss[loss=0.1391, simple_loss=0.2318, pruned_loss=0.02323, over 24072.00 frames. ], tot_loss[loss=0.1469, simple_loss=0.236, pruned_loss=0.02888, over 2513280.61 frames. ], batch size: 344, lr: 1.50e-02, grad_scale: 64.0
2024-03-09 15:24:10,027 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 15:24:15,504 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-28.pt
2024-03-09 15:24:57,644 INFO [train.py:997] (0/4) Epoch 29, batch 0, loss[loss=0.1661, simple_loss=0.2594, pruned_loss=0.03642, over 23744.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2594, pruned_loss=0.03642, over 23744.00 frames. ], batch size: 486, lr: 1.47e-02, grad_scale: 64.0
2024-03-09 15:24:57,644 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:25:06,829 INFO [train.py:1029] (0/4) Epoch 29, validation: loss=0.2094, simple_loss=0.3019, pruned_loss=0.05844, over 452978.00 frames.
2024-03-09 15:25:06,829 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:25:25,786 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 15:25:35,099 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0
2024-03-09 15:26:11,366 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00
2024-03-09 15:26:32,419 INFO [train.py:997] (0/4) Epoch 29, batch 50, loss[loss=0.16, simple_loss=0.2447, pruned_loss=0.03766, over 23922.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.2346, pruned_loss=0.0264, over 1069248.24 frames. ], batch size: 153, lr: 1.47e-02, grad_scale: 64.0
2024-03-09 15:26:59,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=29893.333333333332, ans=0.125
2024-03-09 15:27:10,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=29960.0, ans=0.125
2024-03-09 15:27:27,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.485e+01 7.617e+01 8.419e+01 9.074e+01 1.218e+02, threshold=1.684e+02, percent-clipped=0.0
2024-03-09 15:27:34,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=30093.333333333332, ans=0.0
2024-03-09 15:27:38,565 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.31 vs. limit=15.0
2024-03-09 15:27:41,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=30093.333333333332, ans=0.125
2024-03-09 15:27:41,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30093.333333333332, ans=0.125
2024-03-09 15:27:55,006 INFO [train.py:997] (0/4) Epoch 29, batch 100, loss[loss=0.1432, simple_loss=0.239, pruned_loss=0.02375, over 24013.00 frames. ], tot_loss[loss=0.1462, simple_loss=0.2367, pruned_loss=0.02781, over 1887373.78 frames. ], batch size: 388, lr: 1.47e-02, grad_scale: 64.0
2024-03-09 15:27:58,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=30160.0, ans=0.0
2024-03-09 15:28:25,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30293.333333333332, ans=0.1
2024-03-09 15:28:39,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30293.333333333332, ans=0.125
2024-03-09 15:28:48,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30360.0, ans=0.1
2024-03-09 15:28:59,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=30426.666666666668, ans=0.125
2024-03-09 15:29:12,925 INFO [train.py:997] (0/4) Epoch 29, batch 150, loss[loss=0.1253, simple_loss=0.2077, pruned_loss=0.02143, over 23872.00 frames. ], tot_loss[loss=0.146, simple_loss=0.2357, pruned_loss=0.02818, over 2524369.47 frames. ], batch size: 117, lr: 1.46e-02, grad_scale: 64.0
2024-03-09 15:29:19,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=30493.333333333332, ans=0.0
2024-03-09 15:29:24,886 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-29.pt
2024-03-09 15:30:06,218 INFO [train.py:997] (0/4) Epoch 30, batch 0, loss[loss=0.1409, simple_loss=0.2265, pruned_loss=0.02767, over 24189.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.2265, pruned_loss=0.02767, over 24189.00 frames. ], batch size: 217, lr: 1.44e-02, grad_scale: 64.0
2024-03-09 15:30:06,219 INFO [train.py:1020] (0/4) Computing validation loss
2024-03-09 15:30:13,365 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6786, 4.0017, 4.5676, 3.9307], device='cuda:0')
2024-03-09 15:30:18,510 INFO [train.py:1029] (0/4) Epoch 30, validation: loss=0.2105, simple_loss=0.3027, pruned_loss=0.05915, over 452978.00 frames.
2024-03-09 15:30:18,511 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB
2024-03-09 15:30:46,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=30613.333333333332, ans=0.004214492753623188
2024-03-09 15:30:54,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30680.0, ans=0.1
2024-03-09 15:31:01,626 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.221e+01 6.992e+01 7.523e+01 8.232e+01 1.586e+02, threshold=1.505e+02, percent-clipped=0.0
2024-03-09 15:31:15,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=30746.666666666668, ans=0.2
2024-03-09 15:31:40,991 INFO [train.py:997] (0/4) Epoch 30, batch 50, loss[loss=0.1512, simple_loss=0.2498, pruned_loss=0.02631, over 23747.00 frames. ], tot_loss[loss=0.144, simple_loss=0.2324, pruned_loss=0.02777, over 1075031.15 frames. ], batch size: 447, lr: 1.44e-02, grad_scale: 64.0
2024-03-09 15:31:42,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=30880.0, ans=0.2
2024-03-09 15:32:14,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=31013.333333333332, ans=0.125
2024-03-09 15:32:18,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=31013.333333333332, ans=0.2
2024-03-09 15:32:48,602 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0
2024-03-09 15:33:01,491 INFO [train.py:997] (0/4) Epoch 30, batch 100, loss[loss=0.1258, simple_loss=0.2124, pruned_loss=0.01958, over 24093.00 frames. ], tot_loss[loss=0.1452, simple_loss=0.234, pruned_loss=0.02825, over 1888671.92 frames. ], batch size: 142, lr: 1.43e-02, grad_scale: 64.0
2024-03-09 15:33:12,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=31213.333333333332, ans=0.0
2024-03-09 15:33:24,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=31280.0, ans=0.125
2024-03-09 15:33:31,688 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5
2024-03-09 15:33:43,907 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.080e+01 7.332e+01 7.826e+01 8.661e+01 1.231e+02, threshold=1.565e+02, percent-clipped=0.0
2024-03-09 15:34:20,969 INFO [train.py:997] (0/4) Epoch 30, batch 150, loss[loss=0.1436, simple_loss=0.2333, pruned_loss=0.027, over 24222.00 frames. ], tot_loss[loss=0.1449, simple_loss=0.2337, pruned_loss=0.02802, over 2520301.04 frames. ], batch size: 241, lr: 1.43e-02, grad_scale: 64.0
2024-03-09 15:34:30,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=31546.666666666668, ans=0.0
2024-03-09 15:34:33,217 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-30.pt
2024-03-09 15:34:38,240 INFO [train.py:1248] (0/4) Done!
|