2024-03-09 12:56:09,176 INFO [train.py:1065] (0/4) Training started 2024-03-09 12:56:09,193 INFO [train.py:1075] (0/4) Device: cuda:0 2024-03-09 12:56:09,282 INFO [lexicon.py:168] (0/4) Loading pre-compiled data/lang_char/Linv.pt 2024-03-09 12:56:09,334 INFO [train.py:1086] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2989b0b1186fa6022932804f5b39fbb2781ebf42', 'k2-git-date': 'Fri Nov 24 11:34:10 2023', 'lhotse-version': '1.22.0.dev+git.d8ed1bbb.dirty', 'torch-version': '1.11.0+cu102', 'torch-cuda-available': True, 'torch-cuda-version': '10.2', 'python-version': '3.9', 'icefall-git-branch': 'dev/mdcc', 'icefall-git-sha1': 'f62fc7f0-clean', 'icefall-git-date': 'Sat Mar 9 12:55:42 2024', 'icefall-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/icefall-1.0-py3.9.egg', 'k2-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/k2-1.24.4.dev20231207+cuda10.2.torch1.11.0-py3.9-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/star-home/jinzengrui/lib/miniconda3/envs/dev39/lib/python3.9/site-packages/lhotse-1.22.0.dev0+git.d8ed1bbb.dirty-py3.9.egg/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-2-1207150844-f49d8c4f4-c49d5', 'IP address': '10.177.22.19'}, 'world_size': 4, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 30, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp'), 'lang_dir': PosixPath('data/lang_char'), 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 1, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'blank_id': 0, 'vocab_size': 4852} 2024-03-09 12:56:09,334 INFO [train.py:1088] (0/4) About to create model 2024-03-09 12:56:09,995 INFO [train.py:1092] (0/4) Number of model parameters: 74470867 2024-03-09 12:56:14,924 INFO [train.py:1107] (0/4) Using DDP 2024-03-09 12:56:15,509 INFO [asr_datamodule.py:368] (0/4) About to get train cuts 2024-03-09 12:56:15,622 INFO [asr_datamodule.py:376] (0/4) About to get valid cuts 2024-03-09 12:56:15,640 INFO [asr_datamodule.py:195] (0/4) About to get Musan cuts 2024-03-09 12:56:18,183 INFO [asr_datamodule.py:200] (0/4) Enable MUSAN 2024-03-09 12:56:18,183 INFO [asr_datamodule.py:223] (0/4) Enable SpecAugment 2024-03-09 12:56:18,183 INFO [asr_datamodule.py:224] (0/4) Time warp factor: 80 2024-03-09 12:56:18,184 INFO [asr_datamodule.py:234] (0/4) Num frame mask: 10 2024-03-09 12:56:18,184 INFO [asr_datamodule.py:247] (0/4) About to create train dataset 2024-03-09 12:56:18,184 INFO [asr_datamodule.py:273] (0/4) Using DynamicBucketingSampler. 2024-03-09 12:56:19,023 INFO [asr_datamodule.py:290] (0/4) About to create train dataloader 2024-03-09 12:56:19,023 INFO [asr_datamodule.py:315] (0/4) About to create dev dataset 2024-03-09 12:56:19,346 INFO [asr_datamodule.py:332] (0/4) About to create dev dataloader 2024-03-09 12:57:18,484 INFO [train.py:997] (0/4) Epoch 1, batch 0, loss[loss=10.43, simple_loss=9.503, pruned_loss=9.26, over 23353.00 frames. ], tot_loss[loss=10.43, simple_loss=9.503, pruned_loss=9.26, over 23353.00 frames. ], batch size: 102, lr: 2.25e-02, grad_scale: 1.0 2024-03-09 12:57:18,486 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 12:57:28,778 INFO [train.py:1029] (0/4) Epoch 1, validation: loss=10.41, simple_loss=9.49, pruned_loss=9.134, over 452978.00 frames. 2024-03-09 12:57:28,779 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 25901MB 2024-03-09 12:57:35,930 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=0.0, ans=5.0 2024-03-09 12:57:38,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=0.0, ans=0.3 2024-03-09 12:57:42,630 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.90 vs. limit=5.0 2024-03-09 12:57:42,680 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=29.82 vs. limit=7.5 2024-03-09 12:57:45,309 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=66.66666666666667, ans=0.1975 2024-03-09 12:57:49,154 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=66.66666666666667, ans=0.0985 2024-03-09 12:57:52,244 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.247e+03 5.651e+03 5.908e+03 6.903e+03 6.981e+03, threshold=2.363e+04, percent-clipped=0.0 2024-03-09 12:57:58,163 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=66.66666666666667, ans=0.496875 2024-03-09 12:57:58,667 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=107.87 vs. limit=7.525 2024-03-09 12:58:10,355 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+03 3.453e+03 5.651e+03 6.615e+03 7.215e+03, threshold=2.260e+04, percent-clipped=0.0 2024-03-09 12:58:13,213 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=511.69 vs. limit=7.6 2024-03-09 12:58:15,224 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=232.33 vs. limit=7.6 2024-03-09 12:58:18,454 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=87.97 vs. limit=4.053333333333334 2024-03-09 12:58:24,071 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=223.96 vs. limit=7.575 2024-03-09 12:58:31,644 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=149.27 vs. limit=7.575 2024-03-09 12:58:34,666 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=254.66 vs. limit=7.65 2024-03-09 12:58:35,005 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=277.45 vs. limit=7.575 2024-03-09 12:58:45,198 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=72.53 vs. limit=4.1066666666666665 2024-03-09 12:58:46,102 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.817e+02 1.921e+03 2.306e+03 5.651e+03 7.215e+03, threshold=9.223e+03, percent-clipped=0.0 2024-03-09 12:58:52,634 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=168.22 vs. limit=5.133333333333334 2024-03-09 12:58:59,108 INFO [train.py:997] (0/4) Epoch 1, batch 50, loss[loss=1.111, simple_loss=0.9911, pruned_loss=1.077, over 20140.00 frames. ], tot_loss[loss=3.869, simple_loss=3.562, pruned_loss=3.019, over 1065856.81 frames. ], batch size: 61, lr: 2.48e-02, grad_scale: 0.25 2024-03-09 12:59:00,044 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=367.12 vs. limit=7.625 2024-03-09 12:59:01,816 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=342.93 vs. limit=7.75 2024-03-09 12:59:05,179 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=333.3333333333333, ans=0.8883333333333333 2024-03-09 12:59:11,230 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=468.64 vs. limit=7.625 2024-03-09 12:59:18,026 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=231.76 vs. limit=7.8 2024-03-09 12:59:18,181 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=3.06 2024-03-09 12:59:20,233 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=3.06 2024-03-09 12:59:20,261 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=338.60 vs. limit=7.65 2024-03-09 12:59:26,072 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=400.0, ans=0.20600000000000002 2024-03-09 12:59:26,109 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=400.0, ans=0.296 2024-03-09 12:59:32,398 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=315.58 vs. limit=5.2 2024-03-09 12:59:48,062 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=112.75 vs. limit=7.675 2024-03-09 13:00:01,984 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533.3333333333334, ans=0.29466666666666663 2024-03-09 13:00:02,712 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=229.75 vs. limit=7.7 2024-03-09 13:00:04,724 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.56 vs. limit=4.213333333333333 2024-03-09 13:00:08,200 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=87.24 vs. limit=7.9 2024-03-09 13:00:13,532 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=7.42 vs. limit=4.24 2024-03-09 13:00:22,131 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=239.35 vs. limit=7.725 2024-03-09 13:00:23,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=600.0, ans=0.294 2024-03-09 13:00:24,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=600.0, ans=5.3 2024-03-09 13:00:31,788 INFO [train.py:997] (0/4) Epoch 1, batch 100, loss[loss=1.057, simple_loss=0.9247, pruned_loss=1.073, over 24275.00 frames. ], tot_loss[loss=2.348, simple_loss=2.139, pruned_loss=1.952, over 1881556.47 frames. ], batch size: 267, lr: 2.70e-02, grad_scale: 0.5 2024-03-09 13:00:32,876 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=4.266666666666667 2024-03-09 13:00:37,048 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.046e+01 9.193e+01 2.011e+02 2.156e+03 7.215e+03, threshold=4.023e+02, percent-clipped=0.0 2024-03-09 13:00:41,780 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=64.02 vs. limit=7.75 2024-03-09 13:00:43,219 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=56.94 vs. limit=5.333333333333333 2024-03-09 13:00:51,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=733.3333333333334, ans=0.8743333333333334 2024-03-09 13:00:56,681 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=733.3333333333334, ans=0.04770833333333334 2024-03-09 13:01:02,529 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=13.14 vs. limit=5.183333333333334 2024-03-09 13:01:06,340 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.22 vs. limit=8.1 2024-03-09 13:01:10,128 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=49.03 vs. limit=7.8 2024-03-09 13:01:14,956 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=50.68 vs. limit=8.1 2024-03-09 13:01:29,550 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=160.24 vs. limit=7.825 2024-03-09 13:01:53,106 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=933.3333333333334, ans=0.04708333333333334 2024-03-09 13:01:53,657 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=16.97 vs. limit=5.233333333333333 2024-03-09 13:02:03,201 INFO [train.py:997] (0/4) Epoch 1, batch 150, loss[loss=0.9259, simple_loss=0.7907, pruned_loss=0.9827, over 24134.00 frames. ], tot_loss[loss=1.782, simple_loss=1.604, pruned_loss=1.567, over 2516736.17 frames. ], batch size: 176, lr: 2.93e-02, grad_scale: 0.5 2024-03-09 13:02:11,048 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=218.28 vs. limit=7.875 2024-03-09 13:02:12,725 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=4.4 2024-03-09 13:02:14,113 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=70.25 vs. limit=8.25 2024-03-09 13:02:14,486 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=56.43 vs. limit=7.875 2024-03-09 13:02:16,403 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-1.pt 2024-03-09 13:03:01,373 INFO [train.py:997] (0/4) Epoch 2, batch 0, loss[loss=1.018, simple_loss=0.8779, pruned_loss=1.021, over 23797.00 frames. ], tot_loss[loss=1.018, simple_loss=0.8779, pruned_loss=1.021, over 23797.00 frames. ], batch size: 447, lr: 2.91e-02, grad_scale: 1.0 2024-03-09 13:03:01,374 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:03:11,801 INFO [train.py:1029] (0/4) Epoch 2, validation: loss=0.9516, simple_loss=0.8161, pruned_loss=0.9787, over 452978.00 frames. 2024-03-09 13:03:11,802 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:03:14,878 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=363.92 vs. limit=7.895 2024-03-09 13:03:30,549 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=63.67 vs. limit=7.92 2024-03-09 13:03:35,802 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=44.64 vs. limit=8.34 2024-03-09 13:03:35,858 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=193.72 vs. limit=7.92 2024-03-09 13:03:39,280 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=114.96 vs. limit=7.92 2024-03-09 13:03:45,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=1186.6666666666667, ans=0.444375 2024-03-09 13:03:45,872 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=1186.6666666666667, ans=0.8584666666666667 2024-03-09 13:03:46,654 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=362.60 vs. limit=7.945 2024-03-09 13:03:47,578 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=1186.6666666666667, ans=0.04629166666666667 2024-03-09 13:03:52,886 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=1186.6666666666667, ans=0.1555 2024-03-09 13:04:06,820 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=1253.3333333333333, ans=0.09216666666666667 2024-03-09 13:04:07,408 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=28.63 vs. limit=7.97 2024-03-09 13:04:07,600 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=80.10 vs. limit=7.97 2024-03-09 13:04:07,894 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=24.08 vs. limit=7.97 2024-03-09 13:04:16,360 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=7.97 2024-03-09 13:04:22,922 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=1320.0, ans=0.8538 2024-03-09 13:04:25,867 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=230.87 vs. limit=7.995 2024-03-09 13:04:29,204 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=64.66 vs. limit=5.66 2024-03-09 13:04:31,668 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.992e+01 8.885e+01 1.035e+02 1.288e+02 2.193e+02, threshold=2.069e+02, percent-clipped=0.0 2024-03-09 13:04:36,676 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=97.09 vs. limit=7.995 2024-03-09 13:04:39,621 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=1320.0, ans=0.8538 2024-03-09 13:04:42,154 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=38.92 vs. limit=8.02 2024-03-09 13:04:42,729 INFO [train.py:997] (0/4) Epoch 2, batch 50, loss[loss=0.9398, simple_loss=0.8078, pruned_loss=0.898, over 23692.00 frames. ], tot_loss[loss=0.9102, simple_loss=0.778, pruned_loss=0.9183, over 1074146.93 frames. ], batch size: 486, lr: 3.13e-02, grad_scale: 1.0 2024-03-09 13:04:43,624 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=71.38 vs. limit=8.02 2024-03-09 13:04:56,020 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=16.02 vs. limit=5.693333333333333 2024-03-09 13:04:57,963 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.01 vs. limit=5.346666666666667 2024-03-09 13:04:59,627 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=47.54 vs. limit=8.59 2024-03-09 13:05:04,752 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=4.581333333333333 2024-03-09 13:05:10,192 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.84 vs. limit=5.363333333333333 2024-03-09 13:05:17,322 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1453.3333333333333, ans=0.28546666666666665 2024-03-09 13:05:22,383 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1520.0, ans=0.14300000000000002 2024-03-09 13:05:31,745 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=8.07 2024-03-09 13:05:47,625 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=81.22 vs. limit=8.69 2024-03-09 13:05:49,321 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=53.78 vs. limit=8.69 2024-03-09 13:05:59,965 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=44.57 vs. limit=8.74 2024-03-09 13:06:02,563 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=1653.3333333333333, ans=0.138 2024-03-09 13:06:07,183 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=16.18 vs. limit=8.12 2024-03-09 13:06:09,483 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1653.3333333333333, ans=0.4225 2024-03-09 13:06:11,338 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=1653.3333333333333, ans=0.4225 2024-03-09 13:06:16,168 INFO [train.py:997] (0/4) Epoch 2, batch 100, loss[loss=0.9075, simple_loss=0.7764, pruned_loss=0.8376, over 23795.00 frames. ], tot_loss[loss=0.8809, simple_loss=0.7521, pruned_loss=0.8645, over 1877834.46 frames. ], batch size: 447, lr: 3.35e-02, grad_scale: 2.0 2024-03-09 13:06:21,695 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=1720.0, ans=0.419375 2024-03-09 13:06:34,717 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.52 vs. limit=5.446666666666666 2024-03-09 13:06:36,721 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=8.84 2024-03-09 13:06:41,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1786.6666666666667, ans=0.41625 2024-03-09 13:06:52,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1853.3333333333333, ans=0.2683333333333333 2024-03-09 13:07:02,209 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=68.35 vs. limit=8.89 2024-03-09 13:07:22,188 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=23.25 vs. limit=8.22 2024-03-09 13:07:25,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1920.0, ans=0.41000000000000003 2024-03-09 13:07:26,233 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=21.12 vs. limit=5.96 2024-03-09 13:07:26,415 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=50.27 vs. limit=8.22 2024-03-09 13:07:35,929 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=195.77 vs. limit=8.245 2024-03-09 13:07:37,966 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.386e+01 8.999e+01 1.029e+02 1.187e+02 2.200e+02, threshold=2.058e+02, percent-clipped=1.0 2024-03-09 13:07:46,563 INFO [train.py:997] (0/4) Epoch 2, batch 150, loss[loss=0.8338, simple_loss=0.7059, pruned_loss=0.763, over 23219.00 frames. ], tot_loss[loss=0.8662, simple_loss=0.7386, pruned_loss=0.8275, over 2515639.78 frames. ], batch size: 102, lr: 3.57e-02, grad_scale: 2.0 2024-03-09 13:07:47,938 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.85 vs. limit=8.27 2024-03-09 13:07:59,675 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-2.pt 2024-03-09 13:08:43,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=2106.6666666666665, ans=0.8262666666666667 2024-03-09 13:08:44,927 INFO [train.py:997] (0/4) Epoch 3, batch 0, loss[loss=0.7839, simple_loss=0.6614, pruned_loss=0.7208, over 23163.00 frames. ], tot_loss[loss=0.7839, simple_loss=0.6614, pruned_loss=0.7208, over 23163.00 frames. ], batch size: 102, lr: 3.42e-02, grad_scale: 4.0 2024-03-09 13:08:44,928 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:08:54,190 INFO [train.py:1029] (0/4) Epoch 3, validation: loss=0.8556, simple_loss=0.7313, pruned_loss=0.7513, over 452978.00 frames. 2024-03-09 13:08:54,190 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:08:55,429 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=35.47 vs. limit=8.29 2024-03-09 13:09:00,497 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.23 vs. limit=9.08 2024-03-09 13:09:01,643 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2106.6666666666665, ans=0.2366666666666667 2024-03-09 13:09:09,363 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=4.842666666666666 2024-03-09 13:09:16,520 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=39.43 vs. limit=8.315 2024-03-09 13:09:19,274 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=2173.3333333333335, ans=0.0511 2024-03-09 13:09:22,046 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=53.72 vs. limit=6.086666666666667 2024-03-09 13:09:25,400 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=27.77 vs. limit=9.13 2024-03-09 13:09:35,460 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.22 vs. limit=6.12 2024-03-09 13:09:53,173 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=2306.6666666666665, ans=0.04279166666666667 2024-03-09 13:09:54,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2306.6666666666665, ans=0.391875 2024-03-09 13:10:26,707 INFO [train.py:997] (0/4) Epoch 3, batch 50, loss[loss=0.7973, simple_loss=0.6776, pruned_loss=0.6859, over 19767.00 frames. ], tot_loss[loss=0.8008, simple_loss=0.6802, pruned_loss=0.7039, over 1068611.49 frames. ], batch size: 59, lr: 3.63e-02, grad_scale: 4.0 2024-03-09 13:10:38,491 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=30.10 vs. limit=8.415 2024-03-09 13:10:39,380 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=2440.0, ans=0.042375 2024-03-09 13:10:42,797 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=2506.6666666666665, ans=0.3825 2024-03-09 13:10:42,882 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=2506.6666666666665, ans=0.2376 2024-03-09 13:10:48,921 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=5.626666666666667 2024-03-09 13:10:52,645 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=46.87 vs. limit=8.44 2024-03-09 13:11:04,426 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=9.43 2024-03-09 13:11:04,586 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=9.43 2024-03-09 13:11:04,784 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=14.16 vs. limit=8.465 2024-03-09 13:11:19,166 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=2640.0, ans=0.10099999999999999 2024-03-09 13:11:28,814 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.96 vs. limit=6.32 2024-03-09 13:11:34,500 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.376e+01 1.355e+02 1.829e+02 2.456e+02 5.542e+02, threshold=3.657e+02, percent-clipped=39.0 2024-03-09 13:11:39,146 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=30.81 vs. limit=8.515 2024-03-09 13:11:47,777 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=5.676666666666667 2024-03-09 13:11:55,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=2773.3333333333335, ans=0.037599999999999995 2024-03-09 13:11:56,951 INFO [train.py:997] (0/4) Epoch 3, batch 100, loss[loss=0.6924, simple_loss=0.5943, pruned_loss=0.5563, over 24243.00 frames. ], tot_loss[loss=0.7716, simple_loss=0.6582, pruned_loss=0.6552, over 1880343.40 frames. ], batch size: 188, lr: 3.84e-02, grad_scale: 8.0 2024-03-09 13:12:03,224 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.49 vs. limit=9.58 2024-03-09 13:12:12,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=2840.0, ans=0.366875 2024-03-09 13:12:18,689 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=2840.0, ans=0.2216 2024-03-09 13:12:19,376 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=8.565 2024-03-09 13:12:21,272 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=9.629999999999999 2024-03-09 13:12:24,381 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=9.629999999999999 2024-03-09 13:12:26,266 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=8.565 2024-03-09 13:12:46,236 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.80 vs. limit=6.453333333333333 2024-03-09 13:12:53,814 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.42 vs. limit=6.486666666666666 2024-03-09 13:13:19,575 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=8.64 2024-03-09 13:13:21,284 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=9.78 2024-03-09 13:13:25,257 INFO [train.py:997] (0/4) Epoch 3, batch 150, loss[loss=0.6125, simple_loss=0.5407, pruned_loss=0.4349, over 24159.00 frames. ], tot_loss[loss=0.7192, simple_loss=0.6192, pruned_loss=0.5825, over 2517629.28 frames. ], batch size: 295, lr: 4.05e-02, grad_scale: 8.0 2024-03-09 13:13:30,254 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3106.6666666666665, ans=6.941666666666666 2024-03-09 13:13:33,671 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.991e+01 2024-03-09 13:13:35,995 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=8.665 2024-03-09 13:13:38,207 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-3.pt 2024-03-09 13:14:26,850 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.42 vs. limit=6.58 2024-03-09 13:14:27,472 INFO [train.py:997] (0/4) Epoch 4, batch 0, loss[loss=0.5953, simple_loss=0.5279, pruned_loss=0.4147, over 24073.00 frames. ], tot_loss[loss=0.5953, simple_loss=0.5279, pruned_loss=0.4147, over 24073.00 frames. ], batch size: 365, lr: 3.82e-02, grad_scale: 16.0 2024-03-09 13:14:27,473 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:14:37,768 INFO [train.py:1029] (0/4) Epoch 4, validation: loss=0.515, simple_loss=0.4763, pruned_loss=0.3039, over 452978.00 frames. 2024-03-09 13:14:37,769 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:14:44,077 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=9.870000000000001 2024-03-09 13:15:00,759 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=8.71 2024-03-09 13:15:10,612 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.39 vs. limit=9.92 2024-03-09 13:15:18,525 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3293.3333333333335, ans=0.34562499999999996 2024-03-09 13:15:20,183 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3293.3333333333335, ans=0.07649999999999998 2024-03-09 13:15:23,589 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3293.3333333333335, ans=0.34562499999999996 2024-03-09 13:15:26,888 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3293.3333333333335, ans=0.34562499999999996 2024-03-09 13:15:31,400 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.600e+02 2.775e+02 3.449e+02 4.262e+02 1.233e+03, threshold=6.899e+02, percent-clipped=36.0 2024-03-09 13:15:36,772 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3360.0, ans=0.3425 2024-03-09 13:15:38,577 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3360.0, ans=0.3425 2024-03-09 13:15:42,536 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=10.02 2024-03-09 13:15:47,758 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=8.785 2024-03-09 13:15:55,046 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3426.6666666666665, ans=0.339375 2024-03-09 13:15:57,310 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=10.07 2024-03-09 13:16:02,389 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=10.86 vs. limit=10.07 2024-03-09 13:16:07,037 INFO [train.py:997] (0/4) Epoch 4, batch 50, loss[loss=0.4472, simple_loss=0.4138, pruned_loss=0.2608, over 20172.00 frames. ], tot_loss[loss=0.5215, simple_loss=0.4711, pruned_loss=0.3366, over 1061168.05 frames. ], batch size: 60, lr: 3.92e-02, grad_scale: 8.0 2024-03-09 13:16:15,208 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=10.120000000000001 2024-03-09 13:16:25,705 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3560.0, ans=0.035 2024-03-09 13:16:48,326 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=8.86 2024-03-09 13:16:56,498 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.45 vs. limit=8.86 2024-03-09 13:17:03,789 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3693.3333333333335, ans=0.21306666666666665 2024-03-09 13:17:04,647 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=10.27 2024-03-09 13:17:24,785 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3760.0, ans=0.26239999999999997 2024-03-09 13:17:26,432 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3760.0, ans=0.07 2024-03-09 13:17:26,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3760.0, ans=0.05899999999999997 2024-03-09 13:17:29,825 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3760.0, ans=0.32375 2024-03-09 13:17:32,094 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=8.935 2024-03-09 13:17:32,113 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=8.935 2024-03-09 13:17:32,782 INFO [train.py:997] (0/4) Epoch 4, batch 100, loss[loss=0.4661, simple_loss=0.434, pruned_loss=0.2632, over 24016.00 frames. ], tot_loss[loss=0.4865, simple_loss=0.4458, pruned_loss=0.2959, over 1885565.16 frames. ], batch size: 388, lr: 3.92e-02, grad_scale: 8.0 2024-03-09 13:17:56,788 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=8.96 2024-03-09 13:18:06,591 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.27 vs. limit=5.99 2024-03-09 13:18:14,833 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=8.985 2024-03-09 13:18:26,601 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.478e+02 2.209e+02 2.728e+02 3.814e+02 7.926e+02, threshold=5.455e+02, percent-clipped=1.0 2024-03-09 13:18:30,780 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=5.610666666666667 2024-03-09 13:18:57,703 INFO [train.py:997] (0/4) Epoch 4, batch 150, loss[loss=0.4913, simple_loss=0.4534, pruned_loss=0.2852, over 23722.00 frames. ], tot_loss[loss=0.4589, simple_loss=0.4257, pruned_loss=0.2654, over 2519530.07 frames. ], batch size: 486, lr: 3.91e-02, grad_scale: 8.0 2024-03-09 13:19:01,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4160.0, ans=0.305 2024-03-09 13:19:02,828 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4160.0, ans=0.2584 2024-03-09 13:19:10,215 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-4.pt 2024-03-09 13:19:56,274 INFO [train.py:997] (0/4) Epoch 5, batch 0, loss[loss=0.3976, simple_loss=0.3804, pruned_loss=0.2011, over 24158.00 frames. ], tot_loss[loss=0.3976, simple_loss=0.3804, pruned_loss=0.2011, over 24158.00 frames. ], batch size: 366, lr: 3.65e-02, grad_scale: 16.0 2024-03-09 13:19:56,275 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:20:05,955 INFO [train.py:1029] (0/4) Epoch 5, validation: loss=0.3626, simple_loss=0.3682, pruned_loss=0.1368, over 452978.00 frames. 2024-03-09 13:20:05,956 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:20:37,826 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4346.666666666667, ans=0.07283333333333333 2024-03-09 13:20:54,187 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4413.333333333333, ans=0.29312499999999997 2024-03-09 13:21:12,948 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=5.792 2024-03-09 13:21:23,619 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=6.12 2024-03-09 13:21:30,491 INFO [train.py:997] (0/4) Epoch 5, batch 50, loss[loss=0.3398, simple_loss=0.3376, pruned_loss=0.1468, over 24103.00 frames. ], tot_loss[loss=0.3685, simple_loss=0.3589, pruned_loss=0.1733, over 1069272.23 frames. ], batch size: 165, lr: 3.64e-02, grad_scale: 8.0 2024-03-09 13:21:32,364 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4546.666666666667, ans=0.009881159420289855 2024-03-09 13:21:44,554 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=10.91 2024-03-09 13:21:57,639 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=9.23 2024-03-09 13:22:09,206 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.206e+02 1.970e+02 2.387e+02 3.231e+02 6.932e+02, threshold=4.775e+02, percent-clipped=2.0 2024-03-09 13:22:24,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4746.666666666667, ans=0.7338666666666667 2024-03-09 13:22:27,768 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4746.666666666667, ans=0.27749999999999997 2024-03-09 13:22:35,860 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4813.333333333333, ans=0.7315333333333334 2024-03-09 13:22:45,904 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4813.333333333333, ans=0.27437500000000004 2024-03-09 13:22:47,535 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.973e+00 2024-03-09 13:22:55,078 INFO [train.py:997] (0/4) Epoch 5, batch 100, loss[loss=0.3652, simple_loss=0.3619, pruned_loss=0.1618, over 24171.00 frames. ], tot_loss[loss=0.3607, simple_loss=0.3537, pruned_loss=0.1657, over 1883421.09 frames. ], batch size: 295, lr: 3.64e-02, grad_scale: 8.0 2024-03-09 13:22:56,970 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4880.0, ans=0.2512 2024-03-09 13:23:14,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4946.666666666667, ans=0.25053333333333333 2024-03-09 13:23:21,083 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4946.666666666667, ans=0.268125 2024-03-09 13:23:25,812 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4946.666666666667, ans=0.2742 2024-03-09 13:23:46,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=5080.0, ans=0.26187499999999997 2024-03-09 13:23:51,679 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=11.31 2024-03-09 13:23:52,691 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=5080.0, ans=0.26187499999999997 2024-03-09 13:23:55,837 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5080.0, ans=0.26187499999999997 2024-03-09 13:24:05,289 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.21 vs. limit=6.058666666666667 2024-03-09 13:24:11,529 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.22 vs. limit=11.36 2024-03-09 13:24:19,171 INFO [train.py:997] (0/4) Epoch 5, batch 150, loss[loss=0.3082, simple_loss=0.3134, pruned_loss=0.1242, over 23983.00 frames. ], tot_loss[loss=0.3569, simple_loss=0.3522, pruned_loss=0.1606, over 2528382.17 frames. ], batch size: 142, lr: 3.64e-02, grad_scale: 8.0 2024-03-09 13:24:32,004 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-5.pt 2024-03-09 13:25:15,972 INFO [train.py:997] (0/4) Epoch 6, batch 0, loss[loss=0.3086, simple_loss=0.3176, pruned_loss=0.1181, over 24218.00 frames. ], tot_loss[loss=0.3086, simple_loss=0.3176, pruned_loss=0.1181, over 24218.00 frames. ], batch size: 198, lr: 3.40e-02, grad_scale: 16.0 2024-03-09 13:25:15,973 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:25:26,278 INFO [train.py:1029] (0/4) Epoch 6, validation: loss=0.3173, simple_loss=0.3385, pruned_loss=0.1003, over 452978.00 frames. 2024-03-09 13:25:26,279 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:25:53,996 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=5333.333333333333, ans=0.044444444444444446 2024-03-09 13:25:55,624 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=5333.333333333333, ans=0.25 2024-03-09 13:26:01,735 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.162e+02 1.753e+02 2.102e+02 2.732e+02 4.816e+02, threshold=4.205e+02, percent-clipped=1.0 2024-03-09 13:26:02,099 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5333.333333333333, ans=0.25 2024-03-09 13:26:16,584 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5400.0, ans=0.246 2024-03-09 13:26:26,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=5466.666666666667, ans=0.009681159420289855 2024-03-09 13:26:38,921 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=5533.333333333333, ans=0.7063333333333334 2024-03-09 13:26:56,060 INFO [train.py:997] (0/4) Epoch 6, batch 50, loss[loss=0.2924, simple_loss=0.3065, pruned_loss=0.1054, over 23969.00 frames. ], tot_loss[loss=0.3137, simple_loss=0.3218, pruned_loss=0.1231, over 1071719.59 frames. ], batch size: 142, lr: 3.40e-02, grad_scale: 16.0 2024-03-09 13:26:56,477 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=5600.0, ans=0.2375 2024-03-09 13:27:26,908 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5733.333333333333, ans=0.24266666666666667 2024-03-09 13:27:34,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5733.333333333333, ans=0.24266666666666667 2024-03-09 13:27:46,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5800.0, ans=0.22812500000000002 2024-03-09 13:27:55,490 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=5800.0, ans=0.22812500000000002 2024-03-09 13:28:06,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=5866.666666666667, ans=0.042222222222222223 2024-03-09 13:28:17,513 INFO [train.py:997] (0/4) Epoch 6, batch 100, loss[loss=0.2948, simple_loss=0.3098, pruned_loss=0.1078, over 24268.00 frames. ], tot_loss[loss=0.3142, simple_loss=0.3237, pruned_loss=0.1227, over 1890983.73 frames. ], batch size: 254, lr: 3.40e-02, grad_scale: 8.0 2024-03-09 13:28:26,496 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=9.725 2024-03-09 13:28:37,604 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=12.0 2024-03-09 13:28:45,985 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=6000.0, ans=0.21875 2024-03-09 13:28:47,226 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.031e+02 1.395e+02 1.660e+02 2.447e+02 5.591e+02, threshold=3.319e+02, percent-clipped=4.0 2024-03-09 13:29:17,511 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=6133.333333333333, ans=0.21250000000000002 2024-03-09 13:29:35,092 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=6200.0, ans=0.20937499999999998 2024-03-09 13:29:40,034 INFO [train.py:997] (0/4) Epoch 6, batch 150, loss[loss=0.2647, simple_loss=0.2819, pruned_loss=0.09391, over 23806.00 frames. ], tot_loss[loss=0.3088, simple_loss=0.3202, pruned_loss=0.1188, over 2528188.92 frames. ], batch size: 129, lr: 3.39e-02, grad_scale: 8.0 2024-03-09 13:29:52,943 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-6.pt 2024-03-09 13:30:37,223 INFO [train.py:997] (0/4) Epoch 7, batch 0, loss[loss=0.2559, simple_loss=0.2774, pruned_loss=0.08407, over 23593.00 frames. ], tot_loss[loss=0.2559, simple_loss=0.2774, pruned_loss=0.08407, over 23593.00 frames. ], batch size: 128, lr: 3.18e-02, grad_scale: 16.0 2024-03-09 13:30:37,224 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:30:47,284 INFO [train.py:1029] (0/4) Epoch 7, validation: loss=0.2933, simple_loss=0.3253, pruned_loss=0.08566, over 452978.00 frames. 2024-03-09 13:30:47,285 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:31:20,903 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.93 vs. limit=8.193333333333333 2024-03-09 13:32:16,187 INFO [train.py:997] (0/4) Epoch 7, batch 50, loss[loss=0.2581, simple_loss=0.2825, pruned_loss=0.08409, over 24217.00 frames. ], tot_loss[loss=0.2845, simple_loss=0.3038, pruned_loss=0.1016, over 1055468.09 frames. ], batch size: 229, lr: 3.18e-02, grad_scale: 16.0 2024-03-09 13:32:20,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=6653.333333333333, ans=0.188125 2024-03-09 13:32:27,931 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=6653.333333333333, ans=0.03894444444444445 2024-03-09 13:32:30,835 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.025e+02 1.360e+02 1.605e+02 1.865e+02 3.683e+02, threshold=3.211e+02, percent-clipped=2.0 2024-03-09 13:32:36,652 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.02 2024-03-09 13:32:44,490 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.93 vs. limit=6.68 2024-03-09 13:32:51,691 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-03-09 13:33:05,744 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=6853.333333333333, ans=0.17875000000000002 2024-03-09 13:33:12,012 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6853.333333333333, ans=0.17875000000000002 2024-03-09 13:33:19,168 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=10.095 2024-03-09 13:33:36,452 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=4.048 2024-03-09 13:33:37,020 INFO [train.py:997] (0/4) Epoch 7, batch 100, loss[loss=0.2914, simple_loss=0.3133, pruned_loss=0.1053, over 24106.00 frames. ], tot_loss[loss=0.2803, simple_loss=0.3019, pruned_loss=0.09807, over 1872137.09 frames. ], batch size: 344, lr: 3.18e-02, grad_scale: 16.0 2024-03-09 13:33:51,191 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=6986.666666666667, ans=0.6554666666666666 2024-03-09 13:34:06,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7053.333333333333, ans=0.22946666666666665 2024-03-09 13:34:27,765 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.90 vs. limit=6.796666666666667 2024-03-09 13:34:31,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=7186.666666666667, ans=0.16312500000000002 2024-03-09 13:34:49,287 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=7253.333333333333, ans=0.15999999999999998 2024-03-09 13:34:55,333 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=10.22 2024-03-09 13:34:58,873 INFO [train.py:997] (0/4) Epoch 7, batch 150, loss[loss=0.2432, simple_loss=0.2734, pruned_loss=0.07554, over 23974.00 frames. ], tot_loss[loss=0.2791, simple_loss=0.3024, pruned_loss=0.09717, over 2506566.13 frames. ], batch size: 142, lr: 3.18e-02, grad_scale: 16.0 2024-03-09 13:35:05,428 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=10.245000000000001 2024-03-09 13:35:11,783 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-7.pt 2024-03-09 13:35:57,455 INFO [train.py:997] (0/4) Epoch 8, batch 0, loss[loss=0.2711, simple_loss=0.3, pruned_loss=0.0905, over 24243.00 frames. ], tot_loss[loss=0.2711, simple_loss=0.3, pruned_loss=0.0905, over 24243.00 frames. ], batch size: 311, lr: 2.99e-02, grad_scale: 32.0 2024-03-09 13:35:57,456 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:36:07,342 INFO [train.py:1029] (0/4) Epoch 8, validation: loss=0.2797, simple_loss=0.3212, pruned_loss=0.07915, over 452978.00 frames. 2024-03-09 13:36:07,343 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:36:08,863 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 1.023e+02 1.314e+02 1.638e+02 1.955e+02 4.296e+02, threshold=3.277e+02, percent-clipped=3.0 2024-03-09 13:36:38,971 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=7440.0, ans=9.65 2024-03-09 13:36:56,251 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=7573.333333333333, ans=0.035111111111111114 2024-03-09 13:37:15,867 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=7640.0, ans=0.009208695652173913 2024-03-09 13:37:19,021 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=7640.0, ans=0.009208695652173913 2024-03-09 13:37:28,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=7640.0, ans=0.1 2024-03-09 13:37:28,385 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=7640.0, ans=0.034833333333333334 2024-03-09 13:37:31,057 INFO [train.py:997] (0/4) Epoch 8, batch 50, loss[loss=0.3296, simple_loss=0.3471, pruned_loss=0.1336, over 23635.00 frames. ], tot_loss[loss=0.264, simple_loss=0.2946, pruned_loss=0.08656, over 1075406.85 frames. ], batch size: 485, lr: 2.99e-02, grad_scale: 32.0 2024-03-09 13:37:39,300 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=7706.666666666667, ans=0.6302666666666668 2024-03-09 13:37:47,932 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.97 vs. limit=8.886666666666667 2024-03-09 13:37:50,756 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=10.415 2024-03-09 13:37:52,316 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=6.943333333333333 2024-03-09 13:38:39,073 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=10.49 2024-03-09 13:38:51,043 INFO [train.py:997] (0/4) Epoch 8, batch 100, loss[loss=0.2388, simple_loss=0.2735, pruned_loss=0.07445, over 22787.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.2915, pruned_loss=0.08459, over 1880455.62 frames. ], batch size: 85, lr: 2.99e-02, grad_scale: 32.0 2024-03-09 13:38:52,573 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.761e+01 1.115e+02 1.336e+02 1.652e+02 2.844e+02, threshold=2.672e+02, percent-clipped=0.0 2024-03-09 13:38:58,194 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=7.01 2024-03-09 13:38:59,614 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=7.01 2024-03-09 13:39:06,080 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=13.530000000000001 2024-03-09 13:39:14,740 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-03-09 13:39:16,977 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=13.58 2024-03-09 13:39:48,492 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=8240.0, ans=0.125 2024-03-09 13:39:56,194 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8306.666666666666, ans=0.21693333333333334 2024-03-09 13:39:57,732 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=8306.666666666666, ans=0.6092666666666667 2024-03-09 13:40:00,755 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=8306.666666666666, ans=0.125 2024-03-09 13:40:08,186 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=8306.666666666666, ans=0.04949747468305833 2024-03-09 13:40:12,947 INFO [train.py:997] (0/4) Epoch 8, batch 150, loss[loss=0.2394, simple_loss=0.2773, pruned_loss=0.07394, over 24264.00 frames. ], tot_loss[loss=0.2576, simple_loss=0.2911, pruned_loss=0.08367, over 2514481.08 frames. ], batch size: 188, lr: 2.99e-02, grad_scale: 16.0 2024-03-09 13:40:25,406 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-8.pt 2024-03-09 13:41:11,732 INFO [train.py:997] (0/4) Epoch 9, batch 0, loss[loss=0.2605, simple_loss=0.2977, pruned_loss=0.0851, over 24156.00 frames. ], tot_loss[loss=0.2605, simple_loss=0.2977, pruned_loss=0.0851, over 24156.00 frames. ], batch size: 366, lr: 2.83e-02, grad_scale: 32.0 2024-03-09 13:41:11,733 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:41:21,825 INFO [train.py:1029] (0/4) Epoch 9, validation: loss=0.2624, simple_loss=0.312, pruned_loss=0.07326, over 452978.00 frames. 2024-03-09 13:41:21,826 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:41:26,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=8426.666666666666, ans=0.125 2024-03-09 13:42:20,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=8626.666666666666, ans=0.16373333333333334 2024-03-09 13:42:41,967 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.295e+01 1.084e+02 1.217e+02 1.477e+02 3.480e+02, threshold=2.433e+02, percent-clipped=5.0 2024-03-09 13:42:49,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=8760.0, ans=0.125 2024-03-09 13:42:51,114 INFO [train.py:997] (0/4) Epoch 9, batch 50, loss[loss=0.2396, simple_loss=0.2865, pruned_loss=0.06842, over 24059.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2806, pruned_loss=0.0732, over 1069856.85 frames. ], batch size: 365, lr: 2.83e-02, grad_scale: 32.0 2024-03-09 13:43:06,796 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=8826.666666666666, ans=0.5910666666666667 2024-03-09 13:43:10,009 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-03-09 13:43:33,159 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=8893.333333333334, ans=0.008936231884057972 2024-03-09 13:43:39,162 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=8960.0, ans=0.125 2024-03-09 13:43:52,484 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.15 vs. limit=7.256666666666666 2024-03-09 13:44:03,991 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=9026.666666666666, ans=0.125 2024-03-09 13:44:05,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=9026.666666666666, ans=0.125 2024-03-09 13:44:08,288 INFO [train.py:997] (0/4) Epoch 9, batch 100, loss[loss=0.2258, simple_loss=0.2734, pruned_loss=0.06407, over 23887.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2809, pruned_loss=0.07281, over 1888807.17 frames. ], batch size: 129, lr: 2.83e-02, grad_scale: 32.0 2024-03-09 13:44:17,025 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=14.32 2024-03-09 13:44:21,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=9093.333333333334, ans=0.5817333333333334 2024-03-09 13:44:27,285 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=9160.0, ans=0.028500000000000004 2024-03-09 13:44:31,894 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=9160.0, ans=0.028500000000000004 2024-03-09 13:45:00,475 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=9293.333333333334, ans=0.125 2024-03-09 13:45:05,104 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=9293.333333333334, ans=0.04949747468305833 2024-03-09 13:45:07,141 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=9293.333333333334, ans=0.125 2024-03-09 13:45:20,413 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.522e+01 1.120e+02 1.341e+02 1.607e+02 2.660e+02, threshold=2.681e+02, percent-clipped=5.0 2024-03-09 13:45:27,471 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=9360.0, ans=0.02766666666666667 2024-03-09 13:45:30,101 INFO [train.py:997] (0/4) Epoch 9, batch 150, loss[loss=0.2241, simple_loss=0.2709, pruned_loss=0.06654, over 24266.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2821, pruned_loss=0.07342, over 2526261.32 frames. ], batch size: 229, lr: 2.82e-02, grad_scale: 32.0 2024-03-09 13:45:30,421 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=9426.666666666666, ans=0.027388888888888893 2024-03-09 13:45:42,606 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-9.pt 2024-03-09 13:46:27,250 INFO [train.py:997] (0/4) Epoch 10, batch 0, loss[loss=0.2235, simple_loss=0.2713, pruned_loss=0.06582, over 24276.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2713, pruned_loss=0.06582, over 24276.00 frames. ], batch size: 254, lr: 2.69e-02, grad_scale: 32.0 2024-03-09 13:46:27,251 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:46:37,029 INFO [train.py:1029] (0/4) Epoch 10, validation: loss=0.2538, simple_loss=0.3122, pruned_loss=0.07122, over 452978.00 frames. 2024-03-09 13:46:37,030 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:46:45,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=9480.0, ans=0.125 2024-03-09 13:46:50,615 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=11.055 2024-03-09 13:47:23,078 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=9613.333333333334, ans=0.125 2024-03-09 13:47:24,718 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9613.333333333334, ans=0.20386666666666667 2024-03-09 13:47:40,482 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=11.129999999999999 2024-03-09 13:47:53,550 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=9746.666666666666, ans=0.035 2024-03-09 13:48:02,513 INFO [train.py:997] (0/4) Epoch 10, batch 50, loss[loss=0.2043, simple_loss=0.2603, pruned_loss=0.05221, over 24263.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2784, pruned_loss=0.06936, over 1062874.58 frames. ], batch size: 188, lr: 2.68e-02, grad_scale: 32.0 2024-03-09 13:48:32,567 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=9946.666666666666, ans=0.20053333333333334 2024-03-09 13:48:37,260 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=9946.666666666666, ans=0.5518666666666667 2024-03-09 13:48:37,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=9946.666666666666, ans=0.025222222222222226 2024-03-09 13:48:43,337 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=9946.666666666666, ans=0.125 2024-03-09 13:48:58,432 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.812e+01 1.075e+02 1.246e+02 1.479e+02 2.668e+02, threshold=2.491e+02, percent-clipped=0.0 2024-03-09 13:49:00,333 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10013.333333333334, ans=0.0 2024-03-09 13:49:21,953 INFO [train.py:997] (0/4) Epoch 10, batch 100, loss[loss=0.2463, simple_loss=0.2959, pruned_loss=0.08028, over 23787.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2761, pruned_loss=0.06721, over 1871919.70 frames. ], batch size: 447, lr: 2.68e-02, grad_scale: 32.0 2024-03-09 13:49:23,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=10146.666666666666, ans=0.125 2024-03-09 13:49:48,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=10213.333333333334, ans=0.125 2024-03-09 13:50:06,572 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10280.0, ans=0.125 2024-03-09 13:50:29,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=10413.333333333334, ans=0.2 2024-03-09 13:50:33,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=10413.333333333334, ans=0.5355333333333334 2024-03-09 13:50:43,576 INFO [train.py:997] (0/4) Epoch 10, batch 150, loss[loss=0.2148, simple_loss=0.2699, pruned_loss=0.06309, over 23085.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.2763, pruned_loss=0.06681, over 2516959.31 frames. ], batch size: 101, lr: 2.68e-02, grad_scale: 32.0 2024-03-09 13:50:55,789 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-10.pt 2024-03-09 13:51:41,272 INFO [train.py:997] (0/4) Epoch 11, batch 0, loss[loss=0.205, simple_loss=0.262, pruned_loss=0.05741, over 24262.00 frames. ], tot_loss[loss=0.205, simple_loss=0.262, pruned_loss=0.05741, over 24262.00 frames. ], batch size: 208, lr: 2.56e-02, grad_scale: 32.0 2024-03-09 13:51:41,273 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:51:51,065 INFO [train.py:1029] (0/4) Epoch 11, validation: loss=0.2397, simple_loss=0.3066, pruned_loss=0.06689, over 452978.00 frames. 2024-03-09 13:51:51,066 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:51:57,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=10533.333333333334, ans=0.125 2024-03-09 13:52:29,295 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=10666.666666666666, ans=0.19333333333333336 2024-03-09 13:52:38,414 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.688e+01 1.049e+02 1.183e+02 1.464e+02 2.170e+02, threshold=2.365e+02, percent-clipped=0.0 2024-03-09 13:52:41,839 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=10733.333333333334, ans=0.125 2024-03-09 13:52:46,671 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.55 2024-03-09 13:52:47,422 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=10733.333333333334, ans=0.125 2024-03-09 13:52:58,317 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=10733.333333333334, ans=0.125 2024-03-09 13:53:18,218 INFO [train.py:997] (0/4) Epoch 11, batch 50, loss[loss=0.2075, simple_loss=0.2698, pruned_loss=0.05717, over 24078.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2702, pruned_loss=0.0609, over 1066971.02 frames. ], batch size: 344, lr: 2.56e-02, grad_scale: 32.0 2024-03-09 13:53:24,185 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=4.63 2024-03-09 13:53:29,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=10866.666666666666, ans=0.5196666666666667 2024-03-09 13:53:32,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=10933.333333333334, ans=0.02111111111111111 2024-03-09 13:53:34,305 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10933.333333333334, ans=0.125 2024-03-09 13:54:07,809 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=11066.666666666666, ans=0.5126666666666667 2024-03-09 13:54:38,693 INFO [train.py:997] (0/4) Epoch 11, batch 100, loss[loss=0.1802, simple_loss=0.2444, pruned_loss=0.04494, over 23777.00 frames. ], tot_loss[loss=0.21, simple_loss=0.27, pruned_loss=0.06026, over 1892829.29 frames. ], batch size: 117, lr: 2.55e-02, grad_scale: 32.0 2024-03-09 13:54:51,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11200.0, ans=0.188 2024-03-09 13:55:14,948 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11333.333333333334, ans=0.18666666666666665 2024-03-09 13:55:17,917 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=11333.333333333334, ans=0.125 2024-03-09 13:55:23,737 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.186e+01 9.979e+01 1.131e+02 1.409e+02 2.515e+02, threshold=2.263e+02, percent-clipped=1.0 2024-03-09 13:55:58,234 INFO [train.py:997] (0/4) Epoch 11, batch 150, loss[loss=0.2023, simple_loss=0.272, pruned_loss=0.05463, over 24240.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2715, pruned_loss=0.06134, over 2521132.17 frames. ], batch size: 254, lr: 2.55e-02, grad_scale: 32.0 2024-03-09 13:55:59,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=11533.333333333334, ans=0.18466666666666665 2024-03-09 13:56:10,361 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-11.pt 2024-03-09 13:56:55,629 INFO [train.py:997] (0/4) Epoch 12, batch 0, loss[loss=0.1903, simple_loss=0.2616, pruned_loss=0.04806, over 24261.00 frames. ], tot_loss[loss=0.1903, simple_loss=0.2616, pruned_loss=0.04806, over 24261.00 frames. ], batch size: 254, lr: 2.45e-02, grad_scale: 32.0 2024-03-09 13:56:55,630 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 13:57:03,914 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5088, 3.5142, 3.5759, 2.8665], device='cuda:0') 2024-03-09 13:57:05,244 INFO [train.py:1029] (0/4) Epoch 12, validation: loss=0.2325, simple_loss=0.3061, pruned_loss=0.06737, over 452978.00 frames. 2024-03-09 13:57:05,244 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 13:57:27,413 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=11653.333333333334, ans=0.49213333333333337 2024-03-09 13:57:28,938 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=11653.333333333334, ans=0.018111111111111106 2024-03-09 13:57:44,298 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=11720.0, ans=0.125 2024-03-09 13:57:48,928 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=11720.0, ans=0.125 2024-03-09 13:58:10,085 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=11786.666666666666, ans=0.4874666666666667 2024-03-09 13:58:17,766 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=11853.333333333334, ans=0.0 2024-03-09 13:58:28,232 INFO [train.py:997] (0/4) Epoch 12, batch 50, loss[loss=0.1965, simple_loss=0.2713, pruned_loss=0.05139, over 24224.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2653, pruned_loss=0.05485, over 1077039.37 frames. ], batch size: 327, lr: 2.44e-02, grad_scale: 32.0 2024-03-09 13:58:58,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=12053.333333333334, ans=0.125 2024-03-09 13:58:59,612 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 9.982e+01 1.112e+02 1.363e+02 2.435e+02, threshold=2.224e+02, percent-clipped=1.0 2024-03-09 13:59:49,525 INFO [train.py:997] (0/4) Epoch 12, batch 100, loss[loss=0.1862, simple_loss=0.2546, pruned_loss=0.0525, over 24235.00 frames. ], tot_loss[loss=0.1959, simple_loss=0.2653, pruned_loss=0.05464, over 1895207.18 frames. ], batch size: 188, lr: 2.44e-02, grad_scale: 32.0 2024-03-09 14:00:02,426 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=12253.333333333334, ans=0.38380000000000003 2024-03-09 14:00:11,544 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12320.0, ans=0.125 2024-03-09 14:00:20,665 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12386.666666666666, ans=0.17613333333333334 2024-03-09 14:01:06,264 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=12520.0, ans=0.008147826086956522 2024-03-09 14:01:06,302 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=12520.0, ans=0.125 2024-03-09 14:01:09,127 INFO [train.py:997] (0/4) Epoch 12, batch 150, loss[loss=0.1658, simple_loss=0.2463, pruned_loss=0.03773, over 21582.00 frames. ], tot_loss[loss=0.196, simple_loss=0.2664, pruned_loss=0.05543, over 2517775.23 frames. ], batch size: 718, lr: 2.44e-02, grad_scale: 32.0 2024-03-09 14:01:21,367 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-12.pt 2024-03-09 14:02:05,581 INFO [train.py:997] (0/4) Epoch 13, batch 0, loss[loss=0.1865, simple_loss=0.2604, pruned_loss=0.05207, over 24264.00 frames. ], tot_loss[loss=0.1865, simple_loss=0.2604, pruned_loss=0.05207, over 24264.00 frames. ], batch size: 241, lr: 2.34e-02, grad_scale: 32.0 2024-03-09 14:02:05,582 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:02:18,484 INFO [train.py:1029] (0/4) Epoch 13, validation: loss=0.2245, simple_loss=0.307, pruned_loss=0.06618, over 452978.00 frames. 2024-03-09 14:02:18,486 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:02:21,157 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=12640.0, ans=16.98 2024-03-09 14:02:37,308 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.720e+01 1.064e+02 1.199e+02 1.343e+02 2.089e+02, threshold=2.398e+02, percent-clipped=0.0 2024-03-09 14:02:37,674 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=12706.666666666666, ans=0.013722222222222226 2024-03-09 14:02:51,834 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=12773.333333333334, ans=0.01344444444444444 2024-03-09 14:03:02,756 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=12773.333333333334, ans=0.125 2024-03-09 14:03:04,370 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=12773.333333333334, ans=0.125 2024-03-09 14:03:14,223 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.40 vs. limit=12.315000000000001 2024-03-09 14:03:15,169 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=12840.0, ans=0.013166666666666667 2024-03-09 14:03:19,807 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=12840.0, ans=0.125 2024-03-09 14:03:42,232 INFO [train.py:997] (0/4) Epoch 13, batch 50, loss[loss=0.1762, simple_loss=0.2494, pruned_loss=0.0494, over 24227.00 frames. ], tot_loss[loss=0.1848, simple_loss=0.2605, pruned_loss=0.05125, over 1061752.39 frames. ], batch size: 229, lr: 2.34e-02, grad_scale: 32.0 2024-03-09 14:04:12,634 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=13106.666666666666, ans=0.125 2024-03-09 14:04:23,502 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=13106.666666666666, ans=0.125 2024-03-09 14:04:55,043 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=13240.0, ans=0.43660000000000004 2024-03-09 14:04:57,421 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.59 vs. limit=11.620000000000001 2024-03-09 14:05:04,146 INFO [train.py:997] (0/4) Epoch 13, batch 100, loss[loss=0.1854, simple_loss=0.2647, pruned_loss=0.05286, over 24217.00 frames. ], tot_loss[loss=0.1833, simple_loss=0.2606, pruned_loss=0.05093, over 1878942.86 frames. ], batch size: 295, lr: 2.34e-02, grad_scale: 32.0 2024-03-09 14:05:22,070 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=13373.333333333334, ans=0.125 2024-03-09 14:05:22,112 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=13373.333333333334, ans=0.125 2024-03-09 14:05:24,891 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.077e+01 1.017e+02 1.138e+02 1.327e+02 1.773e+02, threshold=2.276e+02, percent-clipped=0.0 2024-03-09 14:05:29,761 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=13373.333333333334, ans=0.0 2024-03-09 14:05:35,929 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=13440.0, ans=0.125 2024-03-09 14:05:42,018 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=13440.0, ans=0.010666666666666672 2024-03-09 14:05:42,073 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=13440.0, ans=0.125 2024-03-09 14:06:10,670 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=13573.333333333334, ans=0.42493333333333333 2024-03-09 14:06:12,164 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=13573.333333333334, ans=0.0 2024-03-09 14:06:13,651 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13573.333333333334, ans=0.16426666666666667 2024-03-09 14:06:23,139 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.85 vs. limit=8.393333333333334 2024-03-09 14:06:24,138 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13640.0, ans=0.125 2024-03-09 14:06:25,353 INFO [train.py:997] (0/4) Epoch 13, batch 150, loss[loss=0.1975, simple_loss=0.2819, pruned_loss=0.05652, over 23794.00 frames. ], tot_loss[loss=0.183, simple_loss=0.2617, pruned_loss=0.05095, over 2509743.01 frames. ], batch size: 447, lr: 2.34e-02, grad_scale: 32.0 2024-03-09 14:06:26,242 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=9.456 2024-03-09 14:06:27,131 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=13640.0, ans=0.42260000000000003 2024-03-09 14:06:31,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=13640.0, ans=0.125 2024-03-09 14:06:33,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=13640.0, ans=0.125 2024-03-09 14:06:37,593 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-13.pt 2024-03-09 14:07:22,760 INFO [train.py:997] (0/4) Epoch 14, batch 0, loss[loss=0.1767, simple_loss=0.2619, pruned_loss=0.04574, over 24148.00 frames. ], tot_loss[loss=0.1767, simple_loss=0.2619, pruned_loss=0.04574, over 24148.00 frames. ], batch size: 345, lr: 2.25e-02, grad_scale: 32.0 2024-03-09 14:07:22,760 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:07:32,051 INFO [train.py:1029] (0/4) Epoch 14, validation: loss=0.2172, simple_loss=0.3059, pruned_loss=0.06427, over 452978.00 frames. 2024-03-09 14:07:32,052 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:07:46,513 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=13760.0, ans=0.00933333333333334 2024-03-09 14:08:38,098 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-03-09 14:08:48,933 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=13960.0, ans=0.007834782608695651 2024-03-09 14:08:51,018 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.47 vs. limit=9.584 2024-03-09 14:08:53,286 INFO [train.py:997] (0/4) Epoch 14, batch 50, loss[loss=0.1481, simple_loss=0.2405, pruned_loss=0.02789, over 21469.00 frames. ], tot_loss[loss=0.1787, simple_loss=0.2592, pruned_loss=0.04904, over 1071242.12 frames. ], batch size: 714, lr: 2.25e-02, grad_scale: 32.0 2024-03-09 14:08:59,478 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.009e+01 1.028e+02 1.152e+02 1.303e+02 2.373e+02, threshold=2.304e+02, percent-clipped=1.0 2024-03-09 14:09:23,496 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=14093.333333333334, ans=0.007805797101449276 2024-03-09 14:09:32,612 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=14160.0, ans=0.40440000000000004 2024-03-09 14:09:40,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=14226.666666666666, ans=0.125 2024-03-09 14:09:40,652 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=18.17 2024-03-09 14:10:12,224 INFO [train.py:997] (0/4) Epoch 14, batch 100, loss[loss=0.1726, simple_loss=0.2566, pruned_loss=0.04432, over 24220.00 frames. ], tot_loss[loss=0.1776, simple_loss=0.2588, pruned_loss=0.04818, over 1885745.77 frames. ], batch size: 241, lr: 2.25e-02, grad_scale: 32.0 2024-03-09 14:10:40,713 INFO [scaling.py:1023] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=6.8853333333333335 2024-03-09 14:10:47,376 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=14493.333333333334, ans=0.125 2024-03-09 14:11:08,614 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=14560.0, ans=0.125 2024-03-09 14:11:22,764 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.17 vs. limit=9.850666666666665 2024-03-09 14:11:35,856 INFO [train.py:997] (0/4) Epoch 14, batch 150, loss[loss=0.1914, simple_loss=0.2777, pruned_loss=0.05254, over 23993.00 frames. ], tot_loss[loss=0.179, simple_loss=0.2608, pruned_loss=0.04857, over 2514372.92 frames. ], batch size: 388, lr: 2.25e-02, grad_scale: 32.0 2024-03-09 14:11:41,695 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 9.664e+01 1.070e+02 1.194e+02 2.380e+02, threshold=2.140e+02, percent-clipped=1.0 2024-03-09 14:11:47,749 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-14.pt 2024-03-09 14:12:33,830 INFO [train.py:997] (0/4) Epoch 15, batch 0, loss[loss=0.1984, simple_loss=0.283, pruned_loss=0.05691, over 23768.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.283, pruned_loss=0.05691, over 23768.00 frames. ], batch size: 486, lr: 2.17e-02, grad_scale: 32.0 2024-03-09 14:12:33,831 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:12:40,159 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.2567, 2.3136, 2.0556, 2.0763, 2.1980, 2.1469, 2.0836, 2.2145], device='cuda:0') 2024-03-09 14:12:43,268 INFO [train.py:1029] (0/4) Epoch 15, validation: loss=0.2144, simple_loss=0.3029, pruned_loss=0.06295, over 452978.00 frames. 2024-03-09 14:12:43,269 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:13:01,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=14813.333333333334, ans=0.125 2024-03-09 14:13:14,630 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.703333333333333 2024-03-09 14:13:24,467 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=14880.0, ans=0.3792 2024-03-09 14:14:02,100 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=15013.333333333334, ans=0.09899494936611666 2024-03-09 14:14:04,899 INFO [train.py:997] (0/4) Epoch 15, batch 50, loss[loss=0.1676, simple_loss=0.2478, pruned_loss=0.04363, over 24107.00 frames. ], tot_loss[loss=0.1756, simple_loss=0.2578, pruned_loss=0.04672, over 1067826.22 frames. ], batch size: 176, lr: 2.17e-02, grad_scale: 32.0 2024-03-09 14:14:08,241 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=15080.0, ans=0.3722 2024-03-09 14:15:02,551 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=15280.0, ans=0.125 2024-03-09 14:15:08,573 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=15346.666666666666, ans=0.125 2024-03-09 14:15:10,080 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=15346.666666666666, ans=0.125 2024-03-09 14:15:19,050 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.102e+01 1.026e+02 1.164e+02 1.400e+02 2.237e+02, threshold=2.327e+02, percent-clipped=1.0 2024-03-09 14:15:27,120 INFO [train.py:997] (0/4) Epoch 15, batch 100, loss[loss=0.1698, simple_loss=0.2591, pruned_loss=0.04019, over 24259.00 frames. ], tot_loss[loss=0.1748, simple_loss=0.2574, pruned_loss=0.04613, over 1886144.59 frames. ], batch size: 295, lr: 2.17e-02, grad_scale: 32.0 2024-03-09 14:15:30,414 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=15413.333333333334, ans=0.0024444444444444435 2024-03-09 14:15:54,898 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=15480.0, ans=0.007504347826086957 2024-03-09 14:15:56,240 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=15480.0, ans=0.007504347826086957 2024-03-09 14:16:46,435 INFO [train.py:997] (0/4) Epoch 15, batch 150, loss[loss=0.1882, simple_loss=0.2646, pruned_loss=0.05595, over 23887.00 frames. ], tot_loss[loss=0.1737, simple_loss=0.2564, pruned_loss=0.04554, over 2498734.50 frames. ], batch size: 153, lr: 2.16e-02, grad_scale: 32.0 2024-03-09 14:16:58,744 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-15.pt 2024-03-09 14:17:45,381 INFO [train.py:997] (0/4) Epoch 16, batch 0, loss[loss=0.169, simple_loss=0.2606, pruned_loss=0.03866, over 23966.00 frames. ], tot_loss[loss=0.169, simple_loss=0.2606, pruned_loss=0.03866, over 23966.00 frames. ], batch size: 387, lr: 2.09e-02, grad_scale: 32.0 2024-03-09 14:17:45,382 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:17:55,604 INFO [train.py:1029] (0/4) Epoch 16, validation: loss=0.2134, simple_loss=0.3039, pruned_loss=0.06146, over 452978.00 frames. 2024-03-09 14:17:55,604 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:18:08,060 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15800.0, ans=0.14200000000000002 2024-03-09 14:18:36,826 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=13.475 2024-03-09 14:18:51,994 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=13.5 2024-03-09 14:19:01,992 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=16000.0, ans=0.0 2024-03-09 14:19:03,237 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.463e+01 8.632e+01 1.007e+02 1.180e+02 1.868e+02, threshold=2.014e+02, percent-clipped=0.0 2024-03-09 14:19:21,889 INFO [train.py:997] (0/4) Epoch 16, batch 50, loss[loss=0.1638, simple_loss=0.2484, pruned_loss=0.0396, over 24164.00 frames. ], tot_loss[loss=0.1662, simple_loss=0.2508, pruned_loss=0.04081, over 1074508.98 frames. ], batch size: 217, lr: 2.09e-02, grad_scale: 32.0 2024-03-09 14:19:25,321 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=16133.333333333334, ans=0.0 2024-03-09 14:19:28,377 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=16133.333333333334, ans=0.125 2024-03-09 14:19:29,359 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.23 vs. limit=19.6 2024-03-09 14:19:39,059 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-03-09 14:19:39,732 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.66 vs. limit=5.43 2024-03-09 14:20:05,468 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=16266.666666666666, ans=0.125 2024-03-09 14:20:26,003 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.50 vs. limit=13.2 2024-03-09 14:20:28,279 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=16400.0, ans=0.0 2024-03-09 14:20:38,766 INFO [train.py:997] (0/4) Epoch 16, batch 100, loss[loss=0.1717, simple_loss=0.2491, pruned_loss=0.04711, over 24232.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2517, pruned_loss=0.04187, over 1892200.52 frames. ], batch size: 241, lr: 2.09e-02, grad_scale: 32.0 2024-03-09 14:21:16,213 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=16600.0, ans=0.025 2024-03-09 14:21:36,365 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16666.666666666668, ans=0.1333333333333333 2024-03-09 14:21:43,561 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.931e+01 9.706e+01 1.091e+02 1.368e+02, threshold=1.941e+02, percent-clipped=0.0 2024-03-09 14:22:02,412 INFO [train.py:997] (0/4) Epoch 16, batch 150, loss[loss=0.1989, simple_loss=0.2811, pruned_loss=0.05834, over 23724.00 frames. ], tot_loss[loss=0.1686, simple_loss=0.2528, pruned_loss=0.04221, over 2520276.84 frames. ], batch size: 486, lr: 2.09e-02, grad_scale: 32.0 2024-03-09 14:22:07,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16800.0, ans=0.132 2024-03-09 14:22:14,613 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-16.pt 2024-03-09 14:23:00,943 INFO [train.py:997] (0/4) Epoch 17, batch 0, loss[loss=0.163, simple_loss=0.2512, pruned_loss=0.03741, over 24204.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2512, pruned_loss=0.03741, over 24204.00 frames. ], batch size: 295, lr: 2.02e-02, grad_scale: 32.0 2024-03-09 14:23:00,943 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:23:11,390 INFO [train.py:1029] (0/4) Epoch 17, validation: loss=0.215, simple_loss=0.3066, pruned_loss=0.06175, over 452978.00 frames. 2024-03-09 14:23:11,391 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:23:27,234 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=16920.0, ans=0.0 2024-03-09 14:23:49,522 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16986.666666666668, ans=0.125 2024-03-09 14:23:51,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=16986.666666666668, ans=0.0 2024-03-09 14:24:19,676 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=17120.0, ans=0.125 2024-03-09 14:24:36,071 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.62 vs. limit=20.39 2024-03-09 14:24:36,337 INFO [train.py:997] (0/4) Epoch 17, batch 50, loss[loss=0.1638, simple_loss=0.2541, pruned_loss=0.03677, over 24265.00 frames. ], tot_loss[loss=0.1667, simple_loss=0.2512, pruned_loss=0.04117, over 1074718.53 frames. ], batch size: 267, lr: 2.02e-02, grad_scale: 32.0 2024-03-09 14:24:59,753 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.830e-03 2024-03-09 14:25:04,214 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=17253.333333333332, ans=0.125 2024-03-09 14:25:22,879 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 9.326e+01 1.031e+02 1.175e+02 1.521e+02, threshold=2.062e+02, percent-clipped=0.0 2024-03-09 14:25:37,793 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.89 vs. limit=14.044999999999998 2024-03-09 14:25:49,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=17453.333333333332, ans=0.125 2024-03-09 14:25:56,613 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=20.64 2024-03-09 14:25:57,090 INFO [train.py:997] (0/4) Epoch 17, batch 100, loss[loss=0.1625, simple_loss=0.2498, pruned_loss=0.03755, over 24270.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2516, pruned_loss=0.0411, over 1882719.50 frames. ], batch size: 254, lr: 2.02e-02, grad_scale: 32.0 2024-03-09 14:26:11,731 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=5.638 2024-03-09 14:26:21,750 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=17586.666666666668, ans=0.09899494936611666 2024-03-09 14:26:21,775 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=17586.666666666668, ans=0.125 2024-03-09 14:27:15,877 INFO [train.py:997] (0/4) Epoch 17, batch 150, loss[loss=0.1671, simple_loss=0.2548, pruned_loss=0.03968, over 24150.00 frames. ], tot_loss[loss=0.1668, simple_loss=0.2514, pruned_loss=0.04109, over 2517570.99 frames. ], batch size: 345, lr: 2.02e-02, grad_scale: 32.0 2024-03-09 14:27:28,466 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-17.pt 2024-03-09 14:28:12,293 INFO [train.py:997] (0/4) Epoch 18, batch 0, loss[loss=0.1544, simple_loss=0.2397, pruned_loss=0.03458, over 24280.00 frames. ], tot_loss[loss=0.1544, simple_loss=0.2397, pruned_loss=0.03458, over 24280.00 frames. ], batch size: 229, lr: 1.96e-02, grad_scale: 32.0 2024-03-09 14:28:12,294 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:28:22,756 INFO [train.py:1029] (0/4) Epoch 18, validation: loss=0.213, simple_loss=0.3039, pruned_loss=0.06107, over 452978.00 frames. 2024-03-09 14:28:22,756 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:28:29,520 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=17906.666666666668, ans=0.125 2024-03-09 14:28:32,533 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=17906.666666666668, ans=0.0 2024-03-09 14:28:46,261 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=17973.333333333332, ans=0.125 2024-03-09 14:28:47,088 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.11 vs. limit=13.986666666666666 2024-03-09 14:29:02,462 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.00 vs. limit=9.51 2024-03-09 14:29:02,778 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.169e+01 8.782e+01 9.645e+01 1.059e+02 1.496e+02, threshold=1.929e+02, percent-clipped=0.0 2024-03-09 14:29:10,816 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=18040.0, ans=0.125 2024-03-09 14:29:11,444 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=11.216000000000001 2024-03-09 14:29:22,502 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=14.29 2024-03-09 14:29:38,374 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=18173.333333333332, ans=0.125 2024-03-09 14:29:39,937 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=18173.333333333332, ans=0.006918840579710145 2024-03-09 14:29:44,479 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=18240.0, ans=0.125 2024-03-09 14:29:45,631 INFO [train.py:997] (0/4) Epoch 18, batch 50, loss[loss=0.1544, simple_loss=0.2399, pruned_loss=0.03446, over 24260.00 frames. ], tot_loss[loss=0.1633, simple_loss=0.2474, pruned_loss=0.03955, over 1069503.57 frames. ], batch size: 198, lr: 1.96e-02, grad_scale: 32.0 2024-03-09 14:30:20,324 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=18373.333333333332, ans=0.25693333333333346 2024-03-09 14:30:33,256 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=14.415 2024-03-09 14:30:37,354 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=18440.0, ans=0.0 2024-03-09 14:30:41,827 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=18440.0, ans=0.0 2024-03-09 14:31:00,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=18506.666666666668, ans=0.006846376811594203 2024-03-09 14:31:06,274 INFO [train.py:997] (0/4) Epoch 18, batch 100, loss[loss=0.163, simple_loss=0.249, pruned_loss=0.03848, over 24302.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2477, pruned_loss=0.03865, over 1882939.63 frames. ], batch size: 241, lr: 1.96e-02, grad_scale: 32.0 2024-03-09 14:31:15,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=18573.333333333332, ans=0.06426666666666667 2024-03-09 14:31:39,032 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=18706.666666666668, ans=0.125 2024-03-09 14:31:41,829 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.645e+01 9.593e+01 1.057e+02 1.559e+02, threshold=1.919e+02, percent-clipped=0.0 2024-03-09 14:31:49,368 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=18706.666666666668, ans=0.125 2024-03-09 14:31:52,347 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=18706.666666666668, ans=0.0 2024-03-09 14:31:56,916 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18773.333333333332, ans=0.11226666666666668 2024-03-09 14:32:09,675 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=5.8260000000000005 2024-03-09 14:32:26,168 INFO [train.py:997] (0/4) Epoch 18, batch 150, loss[loss=0.1676, simple_loss=0.2505, pruned_loss=0.04236, over 24078.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2492, pruned_loss=0.03919, over 2521185.23 frames. ], batch size: 165, lr: 1.95e-02, grad_scale: 32.0 2024-03-09 14:32:38,333 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-18.pt 2024-03-09 14:33:23,372 INFO [train.py:997] (0/4) Epoch 19, batch 0, loss[loss=0.1729, simple_loss=0.264, pruned_loss=0.04085, over 24022.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.264, pruned_loss=0.04085, over 24022.00 frames. ], batch size: 416, lr: 1.90e-02, grad_scale: 32.0 2024-03-09 14:33:23,373 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:33:30,781 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.4285, 5.0987, 5.3784, 5.1244], device='cuda:0') 2024-03-09 14:33:35,286 INFO [train.py:1029] (0/4) Epoch 19, validation: loss=0.2133, simple_loss=0.3046, pruned_loss=0.061, over 452978.00 frames. 2024-03-09 14:33:35,287 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:33:38,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=18960.0, ans=0.024620000000000003 2024-03-09 14:34:06,394 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=19026.666666666668, ans=0.125 2024-03-09 14:34:31,196 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19160.0, ans=0.10840000000000002 2024-03-09 14:34:35,845 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=19160.0, ans=0.125 2024-03-09 14:34:46,993 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.30 vs. limit=14.613333333333335 2024-03-09 14:34:51,171 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=19226.666666666668, ans=0.006689855072463767 2024-03-09 14:34:53,443 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=14.71 2024-03-09 14:34:55,506 INFO [train.py:997] (0/4) Epoch 19, batch 50, loss[loss=0.165, simple_loss=0.2532, pruned_loss=0.03836, over 24187.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.2465, pruned_loss=0.03701, over 1071248.87 frames. ], batch size: 295, lr: 1.90e-02, grad_scale: 32.0 2024-03-09 14:34:55,787 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19293.333333333332, ans=0.1070666666666667 2024-03-09 14:35:08,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=19293.333333333332, ans=0.00667536231884058 2024-03-09 14:35:08,250 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19293.333333333332, ans=0.0 2024-03-09 14:35:17,297 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.675e+01 9.444e+01 1.046e+02 1.924e+02, threshold=1.889e+02, percent-clipped=1.0 2024-03-09 14:35:42,436 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=19493.333333333332, ans=0.49239999999999995 2024-03-09 14:35:47,102 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=19493.333333333332, ans=0.05 2024-03-09 14:35:50,082 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=19493.333333333332, ans=0.0 2024-03-09 14:35:58,181 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=19560.0, ans=5.934 2024-03-09 14:36:16,165 INFO [train.py:997] (0/4) Epoch 19, batch 100, loss[loss=0.1576, simple_loss=0.2467, pruned_loss=0.03418, over 24199.00 frames. ], tot_loss[loss=0.1615, simple_loss=0.2474, pruned_loss=0.03783, over 1882403.70 frames. ], batch size: 280, lr: 1.90e-02, grad_scale: 32.0 2024-03-09 14:36:21,199 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19626.666666666668, ans=0.10373333333333334 2024-03-09 14:36:27,269 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=19626.666666666668, ans=0.125 2024-03-09 14:36:32,737 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.18 vs. limit=14.86 2024-03-09 14:36:36,463 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=19693.333333333332, ans=0.21073333333333344 2024-03-09 14:36:39,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=19693.333333333332, ans=0.0 2024-03-09 14:36:43,199 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.34 vs. limit=9.923333333333332 2024-03-09 14:36:53,056 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19760.0, ans=0.10240000000000002 2024-03-09 14:36:54,546 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19760.0, ans=0.10240000000000002 2024-03-09 14:37:14,318 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=19826.666666666668, ans=0.006559420289855072 2024-03-09 14:37:27,849 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=19893.333333333332, ans=0.0 2024-03-09 14:37:36,535 INFO [train.py:997] (0/4) Epoch 19, batch 150, loss[loss=0.2038, simple_loss=0.281, pruned_loss=0.06329, over 23262.00 frames. ], tot_loss[loss=0.1626, simple_loss=0.2488, pruned_loss=0.03824, over 2517229.94 frames. ], batch size: 534, lr: 1.89e-02, grad_scale: 32.0 2024-03-09 14:37:43,120 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=19960.0, ans=0.125 2024-03-09 14:37:49,377 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-19.pt 2024-03-09 14:38:31,428 INFO [train.py:997] (0/4) Epoch 20, batch 0, loss[loss=0.1662, simple_loss=0.2468, pruned_loss=0.04281, over 24051.00 frames. ], tot_loss[loss=0.1662, simple_loss=0.2468, pruned_loss=0.04281, over 24051.00 frames. ], batch size: 176, lr: 1.85e-02, grad_scale: 32.0 2024-03-09 14:38:31,429 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:38:38,128 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.1197, 3.9310, 3.8895, 3.4368], device='cuda:0') 2024-03-09 14:38:40,964 INFO [train.py:1029] (0/4) Epoch 20, validation: loss=0.2111, simple_loss=0.3031, pruned_loss=0.05952, over 452978.00 frames. 2024-03-09 14:38:40,964 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:38:53,196 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.462e+01 8.448e+01 9.307e+01 1.038e+02 2.078e+02, threshold=1.861e+02, percent-clipped=1.0 2024-03-09 14:38:53,610 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20013.333333333332, ans=0.1 2024-03-09 14:39:37,876 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=20213.333333333332, ans=0.125 2024-03-09 14:39:57,726 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=20280.0, ans=0.05 2024-03-09 14:39:59,315 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=20280.0, ans=0.0 2024-03-09 14:40:03,546 INFO [train.py:997] (0/4) Epoch 20, batch 50, loss[loss=0.1468, simple_loss=0.2296, pruned_loss=0.03198, over 23582.00 frames. ], tot_loss[loss=0.1552, simple_loss=0.241, pruned_loss=0.0347, over 1076970.64 frames. ], batch size: 128, lr: 1.84e-02, grad_scale: 32.0 2024-03-09 14:41:11,939 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=20613.333333333332, ans=0.09899494936611666 2024-03-09 14:41:25,658 INFO [train.py:997] (0/4) Epoch 20, batch 100, loss[loss=0.1671, simple_loss=0.2498, pruned_loss=0.04219, over 24104.00 frames. ], tot_loss[loss=0.1591, simple_loss=0.2448, pruned_loss=0.03668, over 1894606.49 frames. ], batch size: 165, lr: 1.84e-02, grad_scale: 32.0 2024-03-09 14:41:34,816 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.448e+01 8.010e+01 8.832e+01 9.695e+01 1.353e+02, threshold=1.766e+02, percent-clipped=0.0 2024-03-09 14:41:40,236 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2024-03-09 14:41:49,068 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=20746.666666666668, ans=0.2 2024-03-09 14:42:04,612 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2024-03-09 14:42:04,942 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.21 vs. limit=15.0 2024-03-09 14:42:07,121 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=20813.333333333332, ans=0.07 2024-03-09 14:42:36,165 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=20946.666666666668, ans=0.0 2024-03-09 14:42:44,020 INFO [train.py:997] (0/4) Epoch 20, batch 150, loss[loss=0.1465, simple_loss=0.232, pruned_loss=0.03051, over 24252.00 frames. ], tot_loss[loss=0.1585, simple_loss=0.2445, pruned_loss=0.03627, over 2518931.36 frames. ], batch size: 229, lr: 1.84e-02, grad_scale: 32.0 2024-03-09 14:42:52,079 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=21013.333333333332, ans=0.125 2024-03-09 14:42:53,386 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=21013.333333333332, ans=0.006301449275362319 2024-03-09 14:42:56,093 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-20.pt 2024-03-09 14:43:39,749 INFO [train.py:997] (0/4) Epoch 21, batch 0, loss[loss=0.1557, simple_loss=0.2395, pruned_loss=0.03593, over 22561.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.2395, pruned_loss=0.03593, over 22561.00 frames. ], batch size: 85, lr: 1.79e-02, grad_scale: 32.0 2024-03-09 14:43:39,750 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:43:49,466 INFO [train.py:1029] (0/4) Epoch 21, validation: loss=0.2106, simple_loss=0.3015, pruned_loss=0.05984, over 452978.00 frames. 2024-03-09 14:43:49,467 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:44:14,231 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=21133.333333333332, ans=0.2 2024-03-09 14:44:25,168 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.598e-02 2024-03-09 14:44:26,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=21200.0, ans=0.125 2024-03-09 14:44:41,792 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=21266.666666666668, ans=0.125 2024-03-09 14:45:10,283 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.134e+01 8.236e+01 9.284e+01 1.075e+02 1.651e+02, threshold=1.857e+02, percent-clipped=0.0 2024-03-09 14:45:13,759 INFO [train.py:997] (0/4) Epoch 21, batch 50, loss[loss=0.1334, simple_loss=0.2286, pruned_loss=0.01914, over 21496.00 frames. ], tot_loss[loss=0.1576, simple_loss=0.2452, pruned_loss=0.03499, over 1066137.56 frames. ], batch size: 717, lr: 1.79e-02, grad_scale: 32.0 2024-03-09 14:45:18,606 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=21400.0, ans=0.125 2024-03-09 14:45:20,210 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=21400.0, ans=0.1 2024-03-09 14:45:23,329 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21400.0, ans=0.1 2024-03-09 14:45:37,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=21466.666666666668, ans=0.0 2024-03-09 14:45:57,078 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2024-03-09 14:46:00,728 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=21600.0, ans=0.2 2024-03-09 14:46:29,323 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-03-09 14:46:32,881 INFO [train.py:997] (0/4) Epoch 21, batch 100, loss[loss=0.1936, simple_loss=0.2731, pruned_loss=0.05707, over 23234.00 frames. ], tot_loss[loss=0.1586, simple_loss=0.2461, pruned_loss=0.03551, over 1889581.54 frames. ], batch size: 534, lr: 1.79e-02, grad_scale: 64.0 2024-03-09 14:47:29,119 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.37 vs. limit=22.5 2024-03-09 14:47:35,803 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=21933.333333333332, ans=0.125 2024-03-09 14:47:48,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=22000.0, ans=0.95 2024-03-09 14:47:51,969 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.144e+01 8.919e+01 1.026e+02 1.301e+02, threshold=1.784e+02, percent-clipped=0.0 2024-03-09 14:47:55,065 INFO [train.py:997] (0/4) Epoch 21, batch 150, loss[loss=0.133, simple_loss=0.2276, pruned_loss=0.0192, over 21551.00 frames. ], tot_loss[loss=0.1602, simple_loss=0.2472, pruned_loss=0.03658, over 2523300.87 frames. ], batch size: 718, lr: 1.79e-02, grad_scale: 64.0 2024-03-09 14:48:07,327 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-21.pt 2024-03-09 14:48:50,783 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=12.0 2024-03-09 14:48:51,238 INFO [train.py:997] (0/4) Epoch 22, batch 0, loss[loss=0.1625, simple_loss=0.2543, pruned_loss=0.03538, over 24015.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.2543, pruned_loss=0.03538, over 24015.00 frames. ], batch size: 416, lr: 1.74e-02, grad_scale: 64.0 2024-03-09 14:48:51,239 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:49:00,964 INFO [train.py:1029] (0/4) Epoch 22, validation: loss=0.2117, simple_loss=0.3028, pruned_loss=0.06033, over 452978.00 frames. 2024-03-09 14:49:00,965 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:49:01,259 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22120.0, ans=0.1 2024-03-09 14:49:12,246 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=22120.0, ans=0.2 2024-03-09 14:49:29,886 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-03-09 14:49:56,487 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=22320.0, ans=0.2 2024-03-09 14:49:57,989 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=22320.0, ans=0.1 2024-03-09 14:50:07,153 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=22386.666666666668, ans=0.0 2024-03-09 14:50:23,732 INFO [train.py:997] (0/4) Epoch 22, batch 50, loss[loss=0.1616, simple_loss=0.2404, pruned_loss=0.04134, over 23927.00 frames. ], tot_loss[loss=0.1542, simple_loss=0.2416, pruned_loss=0.0334, over 1068791.91 frames. ], batch size: 153, lr: 1.74e-02, grad_scale: 64.0 2024-03-09 14:50:33,403 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=22453.333333333332, ans=0.07 2024-03-09 14:50:38,103 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=22520.0, ans=0.0 2024-03-09 14:51:04,687 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=22586.666666666668, ans=0.0 2024-03-09 14:51:05,295 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-03-09 14:51:18,924 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=22653.333333333332, ans=0.125 2024-03-09 14:51:28,016 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.132e+01 8.918e+01 9.986e+01 1.265e+02, threshold=1.784e+02, percent-clipped=0.0 2024-03-09 14:51:45,175 INFO [train.py:997] (0/4) Epoch 22, batch 100, loss[loss=0.1544, simple_loss=0.2491, pruned_loss=0.02986, over 24063.00 frames. ], tot_loss[loss=0.1541, simple_loss=0.2412, pruned_loss=0.03344, over 1880311.48 frames. ], batch size: 365, lr: 1.74e-02, grad_scale: 64.0 2024-03-09 14:51:48,485 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=22786.666666666668, ans=0.125 2024-03-09 14:51:49,463 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0 2024-03-09 14:52:05,341 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=22853.333333333332, ans=0.005901449275362319 2024-03-09 14:52:20,616 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=22920.0, ans=0.1 2024-03-09 14:52:28,211 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=22920.0, ans=0.125 2024-03-09 14:52:29,743 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=22920.0, ans=0.125 2024-03-09 14:52:37,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=22986.666666666668, ans=0.0 2024-03-09 14:52:52,569 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=23053.333333333332, ans=0.0 2024-03-09 14:52:52,667 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=23053.333333333332, ans=0.5 2024-03-09 14:53:00,363 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2024-03-09 14:53:01,576 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=23053.333333333332, ans=0.005857971014492754 2024-03-09 14:53:05,950 INFO [train.py:997] (0/4) Epoch 22, batch 150, loss[loss=0.1542, simple_loss=0.2448, pruned_loss=0.03185, over 24194.00 frames. ], tot_loss[loss=0.1545, simple_loss=0.2424, pruned_loss=0.03335, over 2516578.38 frames. ], batch size: 241, lr: 1.74e-02, grad_scale: 64.0 2024-03-09 14:53:15,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=23120.0, ans=0.125 2024-03-09 14:53:18,515 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-22.pt 2024-03-09 14:54:00,138 INFO [train.py:997] (0/4) Epoch 23, batch 0, loss[loss=0.1547, simple_loss=0.2325, pruned_loss=0.03842, over 20296.00 frames. ], tot_loss[loss=0.1547, simple_loss=0.2325, pruned_loss=0.03842, over 20296.00 frames. ], batch size: 60, lr: 1.70e-02, grad_scale: 64.0 2024-03-09 14:54:00,139 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:54:09,892 INFO [train.py:1029] (0/4) Epoch 23, validation: loss=0.2115, simple_loss=0.3036, pruned_loss=0.0597, over 452978.00 frames. 2024-03-09 14:54:09,893 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:55:00,902 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=23373.333333333332, ans=0.125 2024-03-09 14:55:05,176 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.526e+01 7.783e+01 8.704e+01 9.596e+01 1.275e+02, threshold=1.741e+02, percent-clipped=0.0 2024-03-09 14:55:07,111 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=23373.333333333332, ans=0.125 2024-03-09 14:55:33,121 INFO [train.py:997] (0/4) Epoch 23, batch 50, loss[loss=0.1242, simple_loss=0.2224, pruned_loss=0.01296, over 21644.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.2378, pruned_loss=0.03189, over 1055970.27 frames. ], batch size: 718, lr: 1.70e-02, grad_scale: 64.0 2024-03-09 14:55:35,060 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-03-09 14:55:50,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=23573.333333333332, ans=0.1 2024-03-09 14:55:53,715 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=23573.333333333332, ans=0.125 2024-03-09 14:56:01,428 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=23573.333333333332, ans=0.0 2024-03-09 14:56:07,601 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=23640.0, ans=0.0 2024-03-09 14:56:09,129 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=23640.0, ans=0.05 2024-03-09 14:56:51,640 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-03-09 14:56:53,727 INFO [train.py:997] (0/4) Epoch 23, batch 100, loss[loss=0.1374, simple_loss=0.227, pruned_loss=0.02393, over 23991.00 frames. ], tot_loss[loss=0.1535, simple_loss=0.2411, pruned_loss=0.03292, over 1873457.87 frames. ], batch size: 142, lr: 1.69e-02, grad_scale: 64.0 2024-03-09 14:57:00,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=23840.0, ans=0.125 2024-03-09 14:57:02,393 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2024-03-09 14:57:09,351 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=23906.666666666668, ans=0.125 2024-03-09 14:57:45,468 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.240e+01 7.813e+01 8.574e+01 9.589e+01 1.326e+02, threshold=1.715e+02, percent-clipped=0.0 2024-03-09 14:58:13,656 INFO [train.py:997] (0/4) Epoch 23, batch 150, loss[loss=0.1579, simple_loss=0.2442, pruned_loss=0.03576, over 24253.00 frames. ], tot_loss[loss=0.1534, simple_loss=0.2413, pruned_loss=0.03276, over 2510509.13 frames. ], batch size: 198, lr: 1.69e-02, grad_scale: 64.0 2024-03-09 14:58:25,925 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-23.pt 2024-03-09 14:59:06,778 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.44 vs. limit=22.5 2024-03-09 14:59:07,185 INFO [train.py:997] (0/4) Epoch 24, batch 0, loss[loss=0.143, simple_loss=0.2308, pruned_loss=0.02763, over 20364.00 frames. ], tot_loss[loss=0.143, simple_loss=0.2308, pruned_loss=0.02763, over 20364.00 frames. ], batch size: 60, lr: 1.66e-02, grad_scale: 64.0 2024-03-09 14:59:07,185 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 14:59:16,706 INFO [train.py:1029] (0/4) Epoch 24, validation: loss=0.2123, simple_loss=0.3043, pruned_loss=0.06014, over 452978.00 frames. 2024-03-09 14:59:16,707 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 14:59:49,533 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-03-09 14:59:51,054 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=24293.333333333332, ans=0.005588405797101449 2024-03-09 14:59:54,097 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-03-09 14:59:54,866 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.96 vs. limit=12.0 2024-03-09 15:00:26,516 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=24493.333333333332, ans=0.005544927536231884 2024-03-09 15:00:43,104 INFO [train.py:997] (0/4) Epoch 24, batch 50, loss[loss=0.1556, simple_loss=0.245, pruned_loss=0.03309, over 24205.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.2378, pruned_loss=0.03194, over 1073196.93 frames. ], batch size: 295, lr: 1.65e-02, grad_scale: 64.0 2024-03-09 15:00:48,216 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=24560.0, ans=0.125 2024-03-09 15:00:52,062 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.04 vs. limit=10.0 2024-03-09 15:01:20,106 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.514e+01 7.866e+01 8.423e+01 9.105e+01 1.243e+02, threshold=1.685e+02, percent-clipped=0.0 2024-03-09 15:01:34,369 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=24760.0, ans=0.2 2024-03-09 15:02:03,700 INFO [train.py:997] (0/4) Epoch 24, batch 100, loss[loss=0.1498, simple_loss=0.2407, pruned_loss=0.02947, over 24119.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.2396, pruned_loss=0.03236, over 1880690.09 frames. ], batch size: 345, lr: 1.65e-02, grad_scale: 64.0 2024-03-09 15:02:12,946 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=24893.333333333332, ans=0.04949747468305833 2024-03-09 15:02:31,615 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=24960.0, ans=0.0 2024-03-09 15:02:48,476 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=25026.666666666668, ans=0.125 2024-03-09 15:02:55,393 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=22.5 2024-03-09 15:02:56,119 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=25093.333333333332, ans=0.2 2024-03-09 15:03:24,739 INFO [train.py:997] (0/4) Epoch 24, batch 150, loss[loss=0.1926, simple_loss=0.2715, pruned_loss=0.05689, over 23321.00 frames. ], tot_loss[loss=0.1531, simple_loss=0.2406, pruned_loss=0.03286, over 2517082.11 frames. ], batch size: 534, lr: 1.65e-02, grad_scale: 64.0 2024-03-09 15:03:36,211 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-24.pt 2024-03-09 15:04:17,871 INFO [train.py:997] (0/4) Epoch 25, batch 0, loss[loss=0.1555, simple_loss=0.2383, pruned_loss=0.03632, over 23989.00 frames. ], tot_loss[loss=0.1555, simple_loss=0.2383, pruned_loss=0.03632, over 23989.00 frames. ], batch size: 165, lr: 1.61e-02, grad_scale: 64.0 2024-03-09 15:04:17,872 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 15:04:27,731 INFO [train.py:1029] (0/4) Epoch 25, validation: loss=0.2123, simple_loss=0.3048, pruned_loss=0.05995, over 452978.00 frames. 2024-03-09 15:04:27,732 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 15:04:56,153 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.291e+01 7.825e+01 8.498e+01 9.317e+01 1.197e+02, threshold=1.700e+02, percent-clipped=0.0 2024-03-09 15:05:24,399 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25480.0, ans=0.0 2024-03-09 15:05:31,955 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25546.666666666668, ans=0.1 2024-03-09 15:05:35,922 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.12 vs. limit=6.0 2024-03-09 15:05:38,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=25546.666666666668, ans=0.0 2024-03-09 15:05:45,983 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-03-09 15:05:50,869 INFO [train.py:997] (0/4) Epoch 25, batch 50, loss[loss=0.1952, simple_loss=0.2743, pruned_loss=0.05803, over 23275.00 frames. ], tot_loss[loss=0.1525, simple_loss=0.2401, pruned_loss=0.0325, over 1057002.68 frames. ], batch size: 534, lr: 1.61e-02, grad_scale: 64.0 2024-03-09 15:06:08,704 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=25680.0, ans=15.0 2024-03-09 15:06:16,559 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2024-03-09 15:07:06,212 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2024-03-09 15:07:11,204 INFO [train.py:997] (0/4) Epoch 25, batch 100, loss[loss=0.1584, simple_loss=0.2518, pruned_loss=0.03244, over 23996.00 frames. ], tot_loss[loss=0.1517, simple_loss=0.2394, pruned_loss=0.03203, over 1879158.15 frames. ], batch size: 416, lr: 1.61e-02, grad_scale: 64.0 2024-03-09 15:07:22,587 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=25946.666666666668, ans=0.125 2024-03-09 15:07:37,664 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.239e+01 7.935e+01 8.679e+01 9.503e+01 1.168e+02, threshold=1.736e+02, percent-clipped=0.0 2024-03-09 15:07:50,091 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=26080.0, ans=0.0 2024-03-09 15:07:58,457 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=26146.666666666668, ans=0.07 2024-03-09 15:08:31,506 INFO [train.py:997] (0/4) Epoch 25, batch 150, loss[loss=0.1308, simple_loss=0.2132, pruned_loss=0.02421, over 23677.00 frames. ], tot_loss[loss=0.1503, simple_loss=0.2382, pruned_loss=0.03123, over 2512466.32 frames. ], batch size: 116, lr: 1.61e-02, grad_scale: 64.0 2024-03-09 15:08:43,626 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-25.pt 2024-03-09 15:09:26,510 INFO [train.py:997] (0/4) Epoch 26, batch 0, loss[loss=0.148, simple_loss=0.2348, pruned_loss=0.03061, over 24281.00 frames. ], tot_loss[loss=0.148, simple_loss=0.2348, pruned_loss=0.03061, over 24281.00 frames. ], batch size: 281, lr: 1.58e-02, grad_scale: 64.0 2024-03-09 15:09:26,510 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 15:09:35,915 INFO [train.py:1029] (0/4) Epoch 26, validation: loss=0.2091, simple_loss=0.3013, pruned_loss=0.05842, over 452978.00 frames. 2024-03-09 15:09:35,915 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 15:09:51,742 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=26333.333333333332, ans=0.05 2024-03-09 15:09:59,627 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=26400.0, ans=0.005130434782608696 2024-03-09 15:10:04,339 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=26400.0, ans=0.125 2024-03-09 15:10:18,320 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=26466.666666666668, ans=0.125 2024-03-09 15:10:46,303 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.24 vs. limit=10.0 2024-03-09 15:10:55,261 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/checkpoint-4000.pt 2024-03-09 15:10:59,541 INFO [train.py:997] (0/4) Epoch 26, batch 50, loss[loss=0.1583, simple_loss=0.2526, pruned_loss=0.032, over 24015.00 frames. ], tot_loss[loss=0.1477, simple_loss=0.2356, pruned_loss=0.02994, over 1071984.71 frames. ], batch size: 388, lr: 1.57e-02, grad_scale: 64.0 2024-03-09 15:11:01,412 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=26666.666666666668, ans=0.125 2024-03-09 15:11:09,136 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=26666.666666666668, ans=0.125 2024-03-09 15:11:11,922 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.387e+01 7.632e+01 8.183e+01 8.952e+01 1.265e+02, threshold=1.637e+02, percent-clipped=0.0 2024-03-09 15:12:19,645 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=26933.333333333332, ans=0.005014492753623189 2024-03-09 15:12:20,295 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2024-03-09 15:12:21,232 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=27000.0, ans=0.125 2024-03-09 15:12:22,491 INFO [train.py:997] (0/4) Epoch 26, batch 100, loss[loss=0.1537, simple_loss=0.2503, pruned_loss=0.02859, over 24072.00 frames. ], tot_loss[loss=0.1487, simple_loss=0.2372, pruned_loss=0.03011, over 1874992.90 frames. ], batch size: 416, lr: 1.57e-02, grad_scale: 64.0 2024-03-09 15:13:02,494 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.10 vs. limit=15.0 2024-03-09 15:13:03,272 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=27133.333333333332, ans=0.125 2024-03-09 15:13:20,909 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=12.0 2024-03-09 15:13:26,243 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27266.666666666668, ans=0.1 2024-03-09 15:13:41,515 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2024-03-09 15:13:42,259 INFO [train.py:997] (0/4) Epoch 26, batch 150, loss[loss=0.1536, simple_loss=0.2487, pruned_loss=0.02921, over 23951.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.2381, pruned_loss=0.0303, over 2521242.32 frames. ], batch size: 416, lr: 1.57e-02, grad_scale: 64.0 2024-03-09 15:13:49,027 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=27333.333333333332, ans=0.125 2024-03-09 15:13:55,059 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-26.pt 2024-03-09 15:14:38,730 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.880e+01 7.565e+01 8.210e+01 9.162e+01 1.256e+02, threshold=1.642e+02, percent-clipped=0.0 2024-03-09 15:14:38,763 INFO [train.py:997] (0/4) Epoch 27, batch 0, loss[loss=0.1575, simple_loss=0.2397, pruned_loss=0.03766, over 23932.00 frames. ], tot_loss[loss=0.1575, simple_loss=0.2397, pruned_loss=0.03766, over 23932.00 frames. ], batch size: 153, lr: 1.54e-02, grad_scale: 64.0 2024-03-09 15:14:38,763 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 15:14:48,406 INFO [train.py:1029] (0/4) Epoch 27, validation: loss=0.2114, simple_loss=0.3031, pruned_loss=0.05987, over 452978.00 frames. 2024-03-09 15:14:48,406 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 15:15:39,429 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=27520.0, ans=0.125 2024-03-09 15:15:51,800 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=27586.666666666668, ans=0.0 2024-03-09 15:15:54,934 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=27586.666666666668, ans=0.2 2024-03-09 15:16:04,701 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0 2024-03-09 15:16:14,483 INFO [train.py:997] (0/4) Epoch 27, batch 50, loss[loss=0.1505, simple_loss=0.2362, pruned_loss=0.03237, over 24217.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.2384, pruned_loss=0.03242, over 1078106.10 frames. ], batch size: 241, lr: 1.54e-02, grad_scale: 64.0 2024-03-09 15:16:17,944 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=27720.0, ans=0.125 2024-03-09 15:16:24,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=27720.0, ans=0.125 2024-03-09 15:16:25,637 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=27720.0, ans=0.07 2024-03-09 15:16:39,669 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=27786.666666666668, ans=0.004828985507246377 2024-03-09 15:16:50,030 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.75 vs. limit=10.0 2024-03-09 15:17:06,124 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-03-09 15:17:33,762 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.709e+01 7.734e+01 8.550e+01 9.615e+01 1.355e+02, threshold=1.710e+02, percent-clipped=0.0 2024-03-09 15:17:33,799 INFO [train.py:997] (0/4) Epoch 27, batch 100, loss[loss=0.1466, simple_loss=0.2324, pruned_loss=0.03042, over 23679.00 frames. ], tot_loss[loss=0.1491, simple_loss=0.2369, pruned_loss=0.03063, over 1897391.58 frames. ], batch size: 129, lr: 1.53e-02, grad_scale: 64.0 2024-03-09 15:17:44,844 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=28053.333333333332, ans=0.125 2024-03-09 15:18:03,460 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=28120.0, ans=0.025 2024-03-09 15:18:24,712 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=28253.333333333332, ans=0.0 2024-03-09 15:18:35,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=28253.333333333332, ans=0.2 2024-03-09 15:18:50,817 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=28320.0, ans=0.125 2024-03-09 15:18:54,410 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=28386.666666666668, ans=0.125 2024-03-09 15:18:55,715 INFO [train.py:997] (0/4) Epoch 27, batch 150, loss[loss=0.1433, simple_loss=0.2289, pruned_loss=0.02883, over 23245.00 frames. ], tot_loss[loss=0.1495, simple_loss=0.2383, pruned_loss=0.03039, over 2524252.50 frames. ], batch size: 102, lr: 1.53e-02, grad_scale: 64.0 2024-03-09 15:19:08,946 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-27.pt 2024-03-09 15:19:49,041 INFO [train.py:997] (0/4) Epoch 28, batch 0, loss[loss=0.1464, simple_loss=0.2333, pruned_loss=0.02969, over 24248.00 frames. ], tot_loss[loss=0.1464, simple_loss=0.2333, pruned_loss=0.02969, over 24248.00 frames. ], batch size: 188, lr: 1.50e-02, grad_scale: 64.0 2024-03-09 15:19:49,042 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 15:19:59,332 INFO [train.py:1029] (0/4) Epoch 28, validation: loss=0.2107, simple_loss=0.3034, pruned_loss=0.05903, over 452978.00 frames. 2024-03-09 15:19:59,333 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 15:20:54,536 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=28640.0, ans=0.2 2024-03-09 15:21:11,008 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.458e+01 7.529e+01 8.136e+01 8.999e+01 1.198e+02, threshold=1.627e+02, percent-clipped=0.0 2024-03-09 15:21:23,147 INFO [train.py:997] (0/4) Epoch 28, batch 50, loss[loss=0.1528, simple_loss=0.238, pruned_loss=0.03385, over 24062.00 frames. ], tot_loss[loss=0.1489, simple_loss=0.2361, pruned_loss=0.03084, over 1059671.48 frames. ], batch size: 176, lr: 1.50e-02, grad_scale: 64.0 2024-03-09 15:21:27,960 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28773.333333333332, ans=0.1 2024-03-09 15:21:56,861 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=28906.666666666668, ans=0.004585507246376811 2024-03-09 15:22:02,501 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2024-03-09 15:22:23,176 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=28973.333333333332, ans=0.125 2024-03-09 15:22:25,560 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2024-03-09 15:22:43,071 INFO [train.py:997] (0/4) Epoch 28, batch 100, loss[loss=0.1415, simple_loss=0.2314, pruned_loss=0.02578, over 23388.00 frames. ], tot_loss[loss=0.1471, simple_loss=0.2356, pruned_loss=0.02928, over 1873994.08 frames. ], batch size: 102, lr: 1.50e-02, grad_scale: 64.0 2024-03-09 15:23:34,734 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=29306.666666666668, ans=0.004498550724637681 2024-03-09 15:23:50,078 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.379e+01 7.431e+01 8.104e+01 8.725e+01 1.109e+02, threshold=1.621e+02, percent-clipped=0.0 2024-03-09 15:24:02,913 INFO [train.py:997] (0/4) Epoch 28, batch 150, loss[loss=0.1391, simple_loss=0.2318, pruned_loss=0.02323, over 24072.00 frames. ], tot_loss[loss=0.1469, simple_loss=0.236, pruned_loss=0.02888, over 2513280.61 frames. ], batch size: 344, lr: 1.50e-02, grad_scale: 64.0 2024-03-09 15:24:10,027 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-03-09 15:24:15,504 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-28.pt 2024-03-09 15:24:57,644 INFO [train.py:997] (0/4) Epoch 29, batch 0, loss[loss=0.1661, simple_loss=0.2594, pruned_loss=0.03642, over 23744.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2594, pruned_loss=0.03642, over 23744.00 frames. ], batch size: 486, lr: 1.47e-02, grad_scale: 64.0 2024-03-09 15:24:57,644 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 15:25:06,829 INFO [train.py:1029] (0/4) Epoch 29, validation: loss=0.2094, simple_loss=0.3019, pruned_loss=0.05844, over 452978.00 frames. 2024-03-09 15:25:06,829 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 15:25:25,786 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-03-09 15:25:35,099 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-03-09 15:26:11,366 INFO [scaling.py:1119] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-03-09 15:26:32,419 INFO [train.py:997] (0/4) Epoch 29, batch 50, loss[loss=0.16, simple_loss=0.2447, pruned_loss=0.03766, over 23922.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.2346, pruned_loss=0.0264, over 1069248.24 frames. ], batch size: 153, lr: 1.47e-02, grad_scale: 64.0 2024-03-09 15:26:59,011 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=29893.333333333332, ans=0.125 2024-03-09 15:27:10,062 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=29960.0, ans=0.125 2024-03-09 15:27:27,042 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.485e+01 7.617e+01 8.419e+01 9.074e+01 1.218e+02, threshold=1.684e+02, percent-clipped=0.0 2024-03-09 15:27:34,959 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=30093.333333333332, ans=0.0 2024-03-09 15:27:38,565 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.31 vs. limit=15.0 2024-03-09 15:27:41,098 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=30093.333333333332, ans=0.125 2024-03-09 15:27:41,152 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30093.333333333332, ans=0.125 2024-03-09 15:27:55,006 INFO [train.py:997] (0/4) Epoch 29, batch 100, loss[loss=0.1432, simple_loss=0.239, pruned_loss=0.02375, over 24013.00 frames. ], tot_loss[loss=0.1462, simple_loss=0.2367, pruned_loss=0.02781, over 1887373.78 frames. ], batch size: 388, lr: 1.47e-02, grad_scale: 64.0 2024-03-09 15:27:58,381 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=30160.0, ans=0.0 2024-03-09 15:28:25,580 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30293.333333333332, ans=0.1 2024-03-09 15:28:39,143 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=30293.333333333332, ans=0.125 2024-03-09 15:28:48,288 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30360.0, ans=0.1 2024-03-09 15:28:59,362 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=30426.666666666668, ans=0.125 2024-03-09 15:29:12,925 INFO [train.py:997] (0/4) Epoch 29, batch 150, loss[loss=0.1253, simple_loss=0.2077, pruned_loss=0.02143, over 23872.00 frames. ], tot_loss[loss=0.146, simple_loss=0.2357, pruned_loss=0.02818, over 2524369.47 frames. ], batch size: 117, lr: 1.46e-02, grad_scale: 64.0 2024-03-09 15:29:19,439 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=30493.333333333332, ans=0.0 2024-03-09 15:29:24,886 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-29.pt 2024-03-09 15:30:06,218 INFO [train.py:997] (0/4) Epoch 30, batch 0, loss[loss=0.1409, simple_loss=0.2265, pruned_loss=0.02767, over 24189.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.2265, pruned_loss=0.02767, over 24189.00 frames. ], batch size: 217, lr: 1.44e-02, grad_scale: 64.0 2024-03-09 15:30:06,219 INFO [train.py:1020] (0/4) Computing validation loss 2024-03-09 15:30:13,365 INFO [zipformer.py:1858] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6786, 4.0017, 4.5676, 3.9307], device='cuda:0') 2024-03-09 15:30:18,510 INFO [train.py:1029] (0/4) Epoch 30, validation: loss=0.2105, simple_loss=0.3027, pruned_loss=0.05915, over 452978.00 frames. 2024-03-09 15:30:18,511 INFO [train.py:1030] (0/4) Maximum memory allocated so far is 28092MB 2024-03-09 15:30:46,868 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=30613.333333333332, ans=0.004214492753623188 2024-03-09 15:30:54,265 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30680.0, ans=0.1 2024-03-09 15:31:01,626 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.221e+01 6.992e+01 7.523e+01 8.232e+01 1.586e+02, threshold=1.505e+02, percent-clipped=0.0 2024-03-09 15:31:15,865 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=30746.666666666668, ans=0.2 2024-03-09 15:31:40,991 INFO [train.py:997] (0/4) Epoch 30, batch 50, loss[loss=0.1512, simple_loss=0.2498, pruned_loss=0.02631, over 23747.00 frames. ], tot_loss[loss=0.144, simple_loss=0.2324, pruned_loss=0.02777, over 1075031.15 frames. ], batch size: 447, lr: 1.44e-02, grad_scale: 64.0 2024-03-09 15:31:42,814 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=30880.0, ans=0.2 2024-03-09 15:32:14,019 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=31013.333333333332, ans=0.125 2024-03-09 15:32:18,506 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=31013.333333333332, ans=0.2 2024-03-09 15:32:48,602 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.20 vs. limit=6.0 2024-03-09 15:33:01,491 INFO [train.py:997] (0/4) Epoch 30, batch 100, loss[loss=0.1258, simple_loss=0.2124, pruned_loss=0.01958, over 24093.00 frames. ], tot_loss[loss=0.1452, simple_loss=0.234, pruned_loss=0.02825, over 1888671.92 frames. ], batch size: 142, lr: 1.43e-02, grad_scale: 64.0 2024-03-09 15:33:12,562 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=31213.333333333332, ans=0.0 2024-03-09 15:33:24,721 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=31280.0, ans=0.125 2024-03-09 15:33:31,688 INFO [scaling.py:1023] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2024-03-09 15:33:43,907 WARNING [optim.py:487] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.080e+01 7.332e+01 7.826e+01 8.661e+01 1.231e+02, threshold=1.565e+02, percent-clipped=0.0 2024-03-09 15:34:20,969 INFO [train.py:997] (0/4) Epoch 30, batch 150, loss[loss=0.1436, simple_loss=0.2333, pruned_loss=0.027, over 24222.00 frames. ], tot_loss[loss=0.1449, simple_loss=0.2337, pruned_loss=0.02802, over 2520301.04 frames. ], batch size: 241, lr: 1.43e-02, grad_scale: 64.0 2024-03-09 15:34:30,486 INFO [scaling.py:214] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=31546.666666666668, ans=0.0 2024-03-09 15:34:33,217 INFO [checkpoint.py:75] (0/4) Saving checkpoint to zipformer/exp/epoch-30.pt 2024-03-09 15:34:38,240 INFO [train.py:1248] (0/4) Done!