2024-10-08 00:02:08,625 INFO [train.py:1204] (0/2) Training started 2024-10-08 00:02:08,630 INFO [train.py:1214] (0/2) Device: cuda:0 2024-10-08 00:02:08,637 INFO [train.py:1245] (0/2) Using dtype=torch.float16 2024-10-08 00:02:08,637 INFO [train.py:1246] (0/2) Use AMP=True 2024-10-08 00:02:08,637 INFO [train.py:1248] (0/2) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'ignore_id': -1, 'label_smoothing': 0.1, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '8905c6b481d50c1b040a466aa1602df0474818ec', 'k2-git-date': 'Thu Apr 25 12:26:15 2024', 'lhotse-version': '1.23.0.dev+git.ed5797c1.clean', 'torch-version': '2.1.0+cu121', 'torch-cuda-available': True, 'torch-cuda-version': '12.1', 'python-version': '3.1', 'icefall-git-branch': 'dev/asr/libritts', 'icefall-git-sha1': 'f0744877-dirty', 'icefall-git-date': 'Mon Oct 7 23:32:03 2024', 'icefall-path': '/mnt/nvme_share/jinzr/miniconda3/envs/osa/lib/python3.10/site-packages/icefall-1.0-py3.10.egg', 'k2-path': '/mnt/nvme_share/jinzr/miniconda3/envs/osa/lib/python3.10/site-packages/k2-1.24.4.dev20240521+cuda12.1.torch2.1.0-py3.10-linux-x86_64.egg/k2/__init__.py', 'lhotse-path': '/mnt/nvme_share/jinzr/miniconda3/envs/osa/lib/python3.10/site-packages/lhotse-1.23.0.dev0+git.ed5797c1.clean-py3.10.egg/lhotse/__init__.py', 'hostname': 'serverx32', 'IP address': '127.0.1.1'}, 'world_size': 2, 'master_port': 12354, 'tensorboard': True, 'num_epochs': 55, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'attention_decoder_loss_scale': 0.8, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': False, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'attention_decoder_dim': 512, 'attention_decoder_num_layers': 6, 'attention_decoder_attention_dim': 512, 'attention_decoder_num_heads': 8, 'attention_decoder_feedforward_dim': 2048, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'use_attention_decoder': False, 'full_libri': True, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 3600, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'sos_id': 1, 'eos_id': 1, 'vocab_size': 500, 'dtype': torch.float16, 'use_autocast': True} 2024-10-08 00:02:08,638 INFO [train.py:1250] (0/2) About to create model 2024-10-08 00:02:09,177 INFO [train.py:1254] (0/2) Number of model parameters: 65549011 2024-10-08 00:02:10,803 INFO [train.py:1269] (0/2) Using DDP 2024-10-08 00:02:13,618 INFO [asr_datamodule.py:425] (0/2) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2024-10-08 00:02:13,620 INFO [asr_datamodule.py:226] (0/2) Enable MUSAN 2024-10-08 00:02:13,620 INFO [asr_datamodule.py:227] (0/2) About to get Musan cuts 2024-10-08 00:02:15,003 INFO [asr_datamodule.py:251] (0/2) Enable SpecAugment 2024-10-08 00:02:15,004 INFO [asr_datamodule.py:252] (0/2) Time warp factor: 80 2024-10-08 00:02:15,004 INFO [asr_datamodule.py:262] (0/2) Num frame mask: 10 2024-10-08 00:02:15,010 INFO [asr_datamodule.py:275] (0/2) About to create train dataset 2024-10-08 00:02:15,010 INFO [asr_datamodule.py:302] (0/2) Using DynamicBucketingSampler. 2024-10-08 00:02:15,813 INFO [asr_datamodule.py:319] (0/2) About to create train dataloader 2024-10-08 00:02:15,813 INFO [asr_datamodule.py:435] (0/2) About to get dev-clean cuts 2024-10-08 00:02:15,816 INFO [asr_datamodule.py:442] (0/2) About to get dev-other cuts 2024-10-08 00:02:15,817 INFO [asr_datamodule.py:350] (0/2) About to create dev dataset 2024-10-08 00:02:16,297 INFO [asr_datamodule.py:367] (0/2) About to create dev dataloader 2024-10-08 00:02:16,297 INFO [train.py:1474] (0/2) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2024-10-08 00:05:47,611 INFO [scaling.py:1024] (0/2) Whitening: name=None, num_groups=1, num_channels=384, metric=95.49 vs. limit=4.0 2024-10-08 00:05:49,591 INFO [train.py:1504] (0/2) Maximum memory allocated so far is 49852MB 2024-10-08 00:05:52,712 INFO [train.py:1504] (0/2) Maximum memory allocated so far is 51905MB 2024-10-08 00:05:55,334 INFO [scaling.py:1024] (0/2) Whitening: name=None, num_groups=1, num_channels=384, metric=165.41 vs. limit=7.5 2024-10-08 00:05:56,411 INFO [train.py:1504] (0/2) Maximum memory allocated so far is 51905MB 2024-10-08 00:05:59,880 INFO [scaling.py:1024] (0/2) Whitening: name=None, num_groups=1, num_channels=128, metric=103.71 vs. limit=5.0 2024-10-08 00:06:00,170 INFO [train.py:1504] (0/2) Maximum memory allocated so far is 51905MB 2024-10-08 00:06:06,933 INFO [train.py:1504] (0/2) Maximum memory allocated so far is 51905MB 2024-10-08 00:06:13,315 INFO [train.py:1504] (0/2) Maximum memory allocated so far is 51905MB 2024-10-08 00:06:45,806 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=7.5 2024-10-08 00:06:46,957 INFO [train.py:1136] (0/2) Epoch 1, batch 0, loss[loss=7.827, simple_loss=7.124, pruned_loss=7.017, over 86366.00 frames. ], tot_loss[loss=7.827, simple_loss=7.124, pruned_loss=7.017, over 86366.00 frames. ], batch size: 213, lr: 2.25e-02, grad_scale: 1.0 2024-10-08 00:06:46,958 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 00:06:59,439 INFO [train.py:1168] (0/2) Epoch 1, validation: loss=7.956, simple_loss=7.248, pruned_loss=7.066, over 1382211.00 frames. 2024-10-08 00:06:59,440 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 51905MB 2024-10-08 00:07:06,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=0.0, ans=0.5 2024-10-08 00:07:07,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.32 vs. limit=5.0 2024-10-08 00:07:32,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=120.0, ans=0.8958 2024-10-08 00:07:40,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 1.654e+04 1.718e+04 1.908e+04 2.207e+04 2.415e+04, threshold=7.633e+04, percent-clipped=0.0 2024-10-08 00:07:50,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=240.0, ans=0.48875 2024-10-08 00:08:05,174 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.648e+03 1.246e+04 1.771e+04 2.415e+04 1.741e+05, threshold=7.085e+04, percent-clipped=10.0 2024-10-08 00:08:06,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.72 vs. limit=4.096 2024-10-08 00:08:40,436 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=63.40 vs. limit=5.24 2024-10-08 00:08:43,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=480.0, ans=0.8832 2024-10-08 00:08:44,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=236.80 vs. limit=7.68 2024-10-08 00:08:52,727 WARNING [optim.py:503] (0/2) Scaling gradients by 0.08551561087369919, model_norm_threshold=70846.9453125 2024-10-08 00:08:52,863 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.self_attn_weights.in_proj.weight with proportion 0.40, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.723e+11, grad_sumsq=8.418e+14, orig_rms_sq=3.235e-04 2024-10-08 00:08:57,399 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.648e+03 6.984e+03 1.654e+04 2.436e+04 1.268e+06, threshold=6.617e+04, percent-clipped=7.5 2024-10-08 00:08:57,400 WARNING [optim.py:503] (0/2) Scaling gradients by 0.05218914523720741, model_norm_threshold=66168.734375 2024-10-08 00:08:57,533 WARNING [optim.py:575] (0/2) Parameter dominating tot_sumsq module.encoder.encoders.5.encoder.layers.0.self_attn_weights.in_proj.weight with proportion 0.35, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.622e+11, grad_sumsq=2.142e+15, orig_rms_sq=2.624e-04 2024-10-08 00:09:01,479 INFO [train.py:1136] (0/2) Epoch 1, batch 50, loss[loss=1.661, simple_loss=1.489, pruned_loss=1.555, over 87139.00 frames. ], tot_loss[loss=3.652, simple_loss=3.353, pruned_loss=2.91, over 3826886.77 frames. ], batch size: 330, lr: 2.48e-02, grad_scale: 0.0078125 2024-10-08 00:09:09,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600.0, ans=0.294 2024-10-08 00:09:12,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=600.0, ans=7.725 2024-10-08 00:09:14,938 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=54.76 vs. limit=7.725 2024-10-08 00:09:22,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=600.0, ans=5.375 2024-10-08 00:09:22,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=600.0, ans=0.048125 2024-10-08 00:09:28,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=252.22 vs. limit=8.04 2024-10-08 00:09:28,744 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=55.91 vs. limit=7.77 2024-10-08 00:09:35,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=413.75 vs. limit=7.77 2024-10-08 00:09:37,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=61.77 vs. limit=7.77 2024-10-08 00:09:39,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=120.65 vs. limit=5.36 2024-10-08 00:09:45,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=720.0, ans=0.46625 2024-10-08 00:09:48,421 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=296.46 vs. limit=7.815 2024-10-08 00:10:05,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=840.0, ans=5.42 2024-10-08 00:10:07,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=44.01 vs. limit=7.815 2024-10-08 00:10:20,345 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=4.384 2024-10-08 00:10:33,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1080.0, ans=0.035 2024-10-08 00:10:33,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=1080.0, ans=0.0757 2024-10-08 00:10:40,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=145.87 vs. limit=7.905 2024-10-08 00:10:42,587 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=4.432 2024-10-08 00:10:42,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=152.77 vs. limit=8.31 2024-10-08 00:11:00,058 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/bad-model-first-warning-0.pt 2024-10-08 00:11:03,875 WARNING [train.py:1127] (0/2) Grad scale is small: 0.0078125 2024-10-08 00:11:03,876 INFO [train.py:1136] (0/2) Epoch 1, batch 100, loss[loss=1.381, simple_loss=1.204, pruned_loss=1.426, over 87451.00 frames. ], tot_loss[loss=2.428, simple_loss=2.2, pruned_loss=2.098, over 6783660.93 frames. ], batch size: 490, lr: 2.70e-02, grad_scale: 0.015625 2024-10-08 00:11:17,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=1200.0, ans=0.07300000000000001 2024-10-08 00:11:21,173 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 7.025e+02 2.892e+03 1.654e+04 1.268e+06, threshold=5.784e+03, percent-clipped=5.0 2024-10-08 00:11:21,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=1200.0, ans=0.155 2024-10-08 00:11:45,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=49.68 vs. limit=7.995 2024-10-08 00:11:58,677 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=8.58 2024-10-08 00:12:03,147 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=8.58 2024-10-08 00:12:04,481 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=152.76 vs. limit=8.04 2024-10-08 00:12:24,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1560.0, ans=0.2844 2024-10-08 00:12:33,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1560.0, ans=0.426875 2024-10-08 00:12:53,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=1680.0, ans=0.044750000000000005 2024-10-08 00:12:58,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.08 vs. limit=8.76 2024-10-08 00:13:00,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=60.60 vs. limit=8.76 2024-10-08 00:13:02,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=30.04 vs. limit=8.85 2024-10-08 00:13:03,480 INFO [train.py:1136] (0/2) Epoch 1, batch 150, loss[loss=1.098, simple_loss=0.9329, pruned_loss=1.192, over 85676.00 frames. ], tot_loss[loss=1.937, simple_loss=1.733, pruned_loss=1.77, over 9030590.90 frames. ], batch size: 180, lr: 2.93e-02, grad_scale: 0.015625 2024-10-08 00:13:04,395 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.13 vs. limit=5.45 2024-10-08 00:13:06,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=43.65 vs. limit=8.175 2024-10-08 00:13:11,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=8.85 2024-10-08 00:13:25,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=94.39 vs. limit=8.22 2024-10-08 00:13:26,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=31.39 vs. limit=8.22 2024-10-08 00:13:34,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=8.22 2024-10-08 00:13:36,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=4.768 2024-10-08 00:13:51,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=9.03 2024-10-08 00:14:02,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=38.96 vs. limit=8.265 2024-10-08 00:14:03,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=2040.0, ans=0.1235 2024-10-08 00:14:06,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=117.75 vs. limit=8.265 2024-10-08 00:14:08,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=8.265 2024-10-08 00:14:12,676 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.78 vs. limit=9.120000000000001 2024-10-08 00:14:15,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=9.120000000000001 2024-10-08 00:14:19,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=8.31 2024-10-08 00:14:59,127 INFO [train.py:1136] (0/2) Epoch 1, batch 200, loss[loss=1.125, simple_loss=0.9622, pruned_loss=1.108, over 85603.00 frames. ], tot_loss[loss=1.65, simple_loss=1.46, pruned_loss=1.551, over 10821665.93 frames. ], batch size: 787, lr: 3.15e-02, grad_scale: 0.03125 2024-10-08 00:14:59,874 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.08 vs. limit=6.2 2024-10-08 00:15:01,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2400.0, ans=0.3875 2024-10-08 00:15:15,019 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.98 vs. limit=8.4 2024-10-08 00:15:15,704 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.317e+02 3.068e+02 3.603e+02 4.925e+02 3.526e+03, threshold=7.205e+02, percent-clipped=0.0 2024-10-08 00:15:20,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=2520.0, ans=0.381875 2024-10-08 00:15:20,872 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=34.69 vs. limit=9.39 2024-10-08 00:15:23,284 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=9.39 2024-10-08 00:15:28,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=2520.0, ans=0.381875 2024-10-08 00:15:35,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2520.0, ans=0.1055 2024-10-08 00:15:40,519 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=80.45 vs. limit=8.445 2024-10-08 00:15:49,295 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.43 vs. limit=9.48 2024-10-08 00:15:52,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=9.48 2024-10-08 00:15:59,963 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=31.55 vs. limit=8.49 2024-10-08 00:16:10,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=32.97 vs. limit=8.535 2024-10-08 00:16:21,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.74 vs. limit=6.38 2024-10-08 00:16:25,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=204.42 vs. limit=8.535 2024-10-08 00:16:34,412 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=8.58 2024-10-08 00:16:43,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=9.66 2024-10-08 00:16:54,494 INFO [train.py:1136] (0/2) Epoch 1, batch 250, loss[loss=1.073, simple_loss=0.91, pruned_loss=1.036, over 86974.00 frames. ], tot_loss[loss=1.472, simple_loss=1.291, pruned_loss=1.397, over 12204240.00 frames. ], batch size: 583, lr: 3.38e-02, grad_scale: 0.03125 2024-10-08 00:17:02,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.73 vs. limit=8.625 2024-10-08 00:17:17,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3120.0, ans=0.35375 2024-10-08 00:17:20,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=9.84 2024-10-08 00:17:29,372 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=8.67 2024-10-08 00:17:37,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=5.296 2024-10-08 00:17:38,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=125.11 vs. limit=8.715 2024-10-08 00:17:43,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3240.0, ans=0.7866 2024-10-08 00:17:52,775 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=9.93 2024-10-08 00:17:57,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=9.93 2024-10-08 00:18:09,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.44 vs. limit=3.504 2024-10-08 00:18:09,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=37.79 vs. limit=8.76 2024-10-08 00:18:15,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=8.76 2024-10-08 00:18:19,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3360.0, ans=0.7824 2024-10-08 00:18:30,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3480.0, ans=7.175 2024-10-08 00:18:31,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3480.0, ans=8.805 2024-10-08 00:18:38,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3480.0, ans=0.065 2024-10-08 00:18:39,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=9.91 vs. limit=10.11 2024-10-08 00:18:41,168 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=51.95 vs. limit=8.805 2024-10-08 00:18:44,015 INFO [train.py:1136] (0/2) Epoch 1, batch 300, loss[loss=1.013, simple_loss=0.8455, pruned_loss=0.9823, over 87310.00 frames. ], tot_loss[loss=1.35, simple_loss=1.173, pruned_loss=1.285, over 13291502.46 frames. ], batch size: 330, lr: 3.60e-02, grad_scale: 0.0625 2024-10-08 00:18:46,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3600.0, ans=0.33125 2024-10-08 00:18:48,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3600.0, ans=0.04999999999999999 2024-10-08 00:19:01,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=21.55 vs. limit=6.8 2024-10-08 00:19:02,730 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 2.595e+02 3.422e+02 4.681e+02 6.676e+02 3.026e+03, threshold=9.363e+02, percent-clipped=22.0 2024-10-08 00:19:03,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3600.0, ans=8.85 2024-10-08 00:19:22,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3720.0, ans=0.07675000000000001 2024-10-08 00:19:27,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=42.02 vs. limit=8.895 2024-10-08 00:19:39,521 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.66 vs. limit=8.94 2024-10-08 00:20:13,955 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=10.47 2024-10-08 00:20:21,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=39.96 vs. limit=9.03 2024-10-08 00:20:38,062 INFO [train.py:1136] (0/2) Epoch 1, batch 350, loss[loss=1.06, simple_loss=0.8848, pruned_loss=0.98, over 86494.00 frames. ], tot_loss[loss=1.263, simple_loss=1.088, pruned_loss=1.2, over 14168358.85 frames. ], batch size: 620, lr: 3.83e-02, grad_scale: 0.0625 2024-10-08 00:20:39,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=33.21 vs. limit=9.075 2024-10-08 00:20:41,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.91 vs. limit=6.05 2024-10-08 00:20:45,033 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=41.80 vs. limit=9.075 2024-10-08 00:20:46,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=51.08 vs. limit=9.075 2024-10-08 00:20:48,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4200.0, ans=0.09899494936611666 2024-10-08 00:21:14,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4320.0, ans=9.120000000000001 2024-10-08 00:21:20,128 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.67 vs. limit=10.83 2024-10-08 00:21:24,246 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=21.92 vs. limit=7.220000000000001 2024-10-08 00:21:32,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=5.776 2024-10-08 00:21:32,594 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=9.165 2024-10-08 00:21:38,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=91.20 vs. limit=9.165 2024-10-08 00:21:40,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=30.91 vs. limit=10.83 2024-10-08 00:21:52,958 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=9.21 2024-10-08 00:22:28,369 INFO [train.py:1136] (0/2) Epoch 1, batch 400, loss[loss=1.076, simple_loss=0.9005, pruned_loss=0.9475, over 78570.00 frames. ], tot_loss[loss=1.198, simple_loss=1.024, pruned_loss=1.132, over 14835438.26 frames. ], batch size: 1493, lr: 4.05e-02, grad_scale: 0.125 2024-10-08 00:22:31,762 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.41 vs. limit=5.92 2024-10-08 00:22:42,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=5.92 2024-10-08 00:22:44,423 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=179.76 vs. limit=9.3 2024-10-08 00:22:45,311 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.457e+02 4.806e+02 7.174e+02 1.015e+03 2.303e+03, threshold=1.435e+03, percent-clipped=29.0 2024-10-08 00:22:46,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=11.1 2024-10-08 00:22:51,042 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=3.738 2024-10-08 00:22:51,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4920.0, ans=0.2508 2024-10-08 00:23:00,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=16.50 vs. limit=9.345 2024-10-08 00:23:09,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=9.39 2024-10-08 00:23:13,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=38.56 vs. limit=9.39 2024-10-08 00:23:24,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=5040.0, ans=0.7236 2024-10-08 00:23:24,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=5040.0, ans=0.26375000000000004 2024-10-08 00:23:30,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=57.00 vs. limit=9.39 2024-10-08 00:23:48,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=58.13 vs. limit=9.435 2024-10-08 00:23:50,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=22.91 vs. limit=9.435 2024-10-08 00:23:56,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5280.0, ans=0.2525 2024-10-08 00:23:56,713 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=24.09 vs. limit=9.48 2024-10-08 00:23:57,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=20.04 vs. limit=9.48 2024-10-08 00:24:16,598 INFO [train.py:1136] (0/2) Epoch 1, batch 450, loss[loss=1.052, simple_loss=0.871, pruned_loss=0.9219, over 85486.00 frames. ], tot_loss[loss=1.152, simple_loss=0.9771, pruned_loss=1.076, over 15342410.28 frames. ], batch size: 787, lr: 4.28e-02, grad_scale: 0.125 2024-10-08 00:24:25,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=9.525 2024-10-08 00:24:26,398 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=31.40 vs. limit=9.525 2024-10-08 00:24:38,498 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=9.525 2024-10-08 00:24:59,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=5640.0, ans=0.009643478260869566 2024-10-08 00:24:59,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5640.0, ans=0.24359999999999998 2024-10-08 00:25:05,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=5640.0, ans=0.04316666666666667 2024-10-08 00:25:10,745 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=16.12 vs. limit=6.41 2024-10-08 00:25:26,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=148.01 vs. limit=9.66 2024-10-08 00:25:26,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=11.82 2024-10-08 00:25:30,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=5760.0, ans=0.22999999999999998 2024-10-08 00:25:31,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=27.77 vs. limit=9.66 2024-10-08 00:26:06,676 INFO [train.py:1136] (0/2) Epoch 1, batch 500, loss[loss=0.9812, simple_loss=0.814, pruned_loss=0.8271, over 87408.00 frames. ], tot_loss[loss=1.117, simple_loss=0.9424, pruned_loss=1.027, over 15759577.51 frames. ], batch size: 490, lr: 4.49e-02, grad_scale: 0.25 2024-10-08 00:26:10,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.61 vs. limit=9.75 2024-10-08 00:26:13,709 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.99 vs. limit=6.4 2024-10-08 00:26:22,579 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.970e+02 7.452e+02 9.993e+02 1.450e+03 3.432e+03, threshold=1.999e+03, percent-clipped=25.0 2024-10-08 00:26:48,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=87.75 vs. limit=9.795 2024-10-08 00:26:59,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.74 vs. limit=6.5600000000000005 2024-10-08 00:27:01,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6240.0, ans=0.23759999999999998 2024-10-08 00:27:09,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=12.27 2024-10-08 00:27:13,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=9.885 2024-10-08 00:27:24,088 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=12.27 2024-10-08 00:27:27,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=6360.0, ans=0.20187500000000003 2024-10-08 00:27:33,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.48 vs. limit=6.5920000000000005 2024-10-08 00:27:40,341 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=12.36 2024-10-08 00:27:40,674 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=23.00 vs. limit=9.93 2024-10-08 00:27:55,176 INFO [train.py:1136] (0/2) Epoch 1, batch 550, loss[loss=0.906, simple_loss=0.7648, pruned_loss=0.7085, over 87142.00 frames. ], tot_loss[loss=1.079, simple_loss=0.9084, pruned_loss=0.9677, over 16077335.08 frames. ], batch size: 350, lr: 4.49e-02, grad_scale: 0.25 2024-10-08 00:27:56,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=9.975 2024-10-08 00:28:01,780 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=9.975 2024-10-08 00:28:05,196 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=6600.0, ans=0.190625 2024-10-08 00:28:19,577 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=10.02 2024-10-08 00:28:21,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=6720.0, ans=0.185 2024-10-08 00:28:42,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=10.065 2024-10-08 00:28:55,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=6840.0, ans=0.03816666666666667 2024-10-08 00:29:03,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=10.11 2024-10-08 00:29:06,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=6960.0, ans=0.03766666666666667 2024-10-08 00:29:23,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=10.155 2024-10-08 00:29:31,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=10.155 2024-10-08 00:29:41,784 INFO [train.py:1136] (0/2) Epoch 1, batch 600, loss[loss=0.9194, simple_loss=0.7747, pruned_loss=0.7049, over 81929.00 frames. ], tot_loss[loss=1.035, simple_loss=0.8731, pruned_loss=0.8991, over 16292921.66 frames. ], batch size: 1245, lr: 4.49e-02, grad_scale: 0.5 2024-10-08 00:29:53,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=7200.0, ans=0.16249999999999998 2024-10-08 00:29:57,243 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.431e+02 9.556e+02 1.220e+03 1.911e+03 5.564e+03, threshold=2.439e+03, percent-clipped=21.0 2024-10-08 00:30:01,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=10.245000000000001 2024-10-08 00:30:11,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=7320.0, ans=0.02 2024-10-08 00:30:19,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=7440.0, ans=10.29 2024-10-08 00:30:32,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=7440.0, ans=0.0 2024-10-08 00:30:35,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=10.29 2024-10-08 00:30:40,989 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=10.29 2024-10-08 00:30:50,730 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=6.35 vs. limit=5.0 2024-10-08 00:31:25,341 INFO [train.py:1136] (0/2) Epoch 1, batch 650, loss[loss=0.7526, simple_loss=0.6619, pruned_loss=0.505, over 87036.00 frames. ], tot_loss[loss=0.9812, simple_loss=0.8319, pruned_loss=0.8228, over 16447189.91 frames. ], batch size: 547, lr: 4.49e-02, grad_scale: 0.5 2024-10-08 00:31:42,571 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.04 vs. limit=4.188 2024-10-08 00:31:47,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=7920.0, ans=0.025 2024-10-08 00:31:47,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7920.0, ans=0.2208 2024-10-08 00:32:15,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=8040.0, ans=0.3206 2024-10-08 00:32:51,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=8280.0, ans=0.125 2024-10-08 00:32:56,319 INFO [train.py:1136] (0/2) Epoch 1, batch 700, loss[loss=0.7122, simple_loss=0.6319, pruned_loss=0.4598, over 86922.00 frames. ], tot_loss[loss=0.9237, simple_loss=0.7891, pruned_loss=0.7444, over 16621402.55 frames. ], batch size: 583, lr: 4.49e-02, grad_scale: 1.0 2024-10-08 00:33:04,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8400.0, ans=0.216 2024-10-08 00:33:09,375 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.683e+02 8.992e+02 1.352e+03 2.272e+03 3.408e+03, threshold=2.704e+03, percent-clipped=21.0 2024-10-08 00:33:09,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=8400.0, ans=0.125 2024-10-08 00:33:18,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.24 vs. limit=9.26 2024-10-08 00:33:34,597 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.155e+01 2024-10-08 00:34:21,964 INFO [train.py:1136] (0/2) Epoch 1, batch 750, loss[loss=0.6547, simple_loss=0.5942, pruned_loss=0.3929, over 87002.00 frames. ], tot_loss[loss=0.8691, simple_loss=0.7486, pruned_loss=0.673, over 16693608.95 frames. ], batch size: 583, lr: 4.49e-02, grad_scale: 1.0 2024-10-08 00:34:34,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=9000.0, ans=0.125 2024-10-08 00:34:40,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=9120.0, ans=0.125 2024-10-08 00:35:46,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=9600.0, ans=0.125 2024-10-08 00:35:47,013 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=7.84 2024-10-08 00:35:47,899 INFO [train.py:1136] (0/2) Epoch 1, batch 800, loss[loss=0.736, simple_loss=0.6488, pruned_loss=0.4716, over 78558.00 frames. ], tot_loss[loss=0.8214, simple_loss=0.7137, pruned_loss=0.6112, over 16722525.80 frames. ], batch size: 1493, lr: 4.49e-02, grad_scale: 2.0 2024-10-08 00:35:52,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=9600.0, ans=0.125 2024-10-08 00:35:57,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=9600.0, ans=0.02666666666666667 2024-10-08 00:36:01,933 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.243e+02 1.124e+03 1.511e+03 2.371e+03 4.535e+03, threshold=3.022e+03, percent-clipped=15.0 2024-10-08 00:36:03,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=9720.0, ans=0.125 2024-10-08 00:36:05,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=9720.0, ans=0.04949747468305833 2024-10-08 00:36:14,868 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-1.pt 2024-10-08 00:37:10,096 INFO [train.py:1136] (0/2) Epoch 2, batch 0, loss[loss=0.6591, simple_loss=0.599, pruned_loss=0.3901, over 84437.00 frames. ], tot_loss[loss=0.6591, simple_loss=0.599, pruned_loss=0.3901, over 84437.00 frames. ], batch size: 958, lr: 4.40e-02, grad_scale: 4.0 2024-10-08 00:37:10,097 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 00:37:20,960 INFO [train.py:1168] (0/2) Epoch 2, validation: loss=0.4861, simple_loss=0.4823, pruned_loss=0.2187, over 1382211.00 frames. 2024-10-08 00:37:20,961 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 00:38:05,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=10032.0, ans=0.125 2024-10-08 00:38:06,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=10032.0, ans=0.02486666666666667 2024-10-08 00:38:58,614 INFO [train.py:1136] (0/2) Epoch 2, batch 50, loss[loss=0.5406, simple_loss=0.5058, pruned_loss=0.2946, over 87073.00 frames. ], tot_loss[loss=0.5993, simple_loss=0.5527, pruned_loss=0.3403, over 3841419.69 frames. ], batch size: 296, lr: 4.40e-02, grad_scale: 1.0 2024-10-08 00:39:20,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=10512.0, ans=0.14488 2024-10-08 00:39:38,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=10632.0, ans=0.09899494936611666 2024-10-08 00:40:04,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=10752.0, ans=0.52368 2024-10-08 00:40:15,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.09 vs. limit=15.654 2024-10-08 00:40:23,592 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.958e+02 9.574e+02 1.271e+03 1.982e+03 3.370e+03, threshold=2.543e+03, percent-clipped=3.0 2024-10-08 00:40:36,422 INFO [train.py:1136] (0/2) Epoch 2, batch 100, loss[loss=0.5239, simple_loss=0.4967, pruned_loss=0.2753, over 86586.00 frames. ], tot_loss[loss=0.5763, simple_loss=0.5371, pruned_loss=0.3178, over 6792147.69 frames. ], batch size: 213, lr: 4.40e-02, grad_scale: 2.0 2024-10-08 00:40:55,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.13 vs. limit=7.7780000000000005 2024-10-08 00:40:58,045 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=11112.0, ans=0.51108 2024-10-08 00:41:01,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11112.0, ans=0.18888 2024-10-08 00:41:02,094 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=11.667 2024-10-08 00:41:12,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.97 vs. limit=7.7780000000000005 2024-10-08 00:41:15,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=11232.0, ans=0.125 2024-10-08 00:41:19,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=11232.0, ans=0.0 2024-10-08 00:41:22,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=4.6848 2024-10-08 00:41:43,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=11352.0, ans=0.0 2024-10-08 00:41:50,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=11472.0, ans=0.008375652173913044 2024-10-08 00:41:58,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=11472.0, ans=0.025 2024-10-08 00:42:14,762 INFO [train.py:1136] (0/2) Epoch 2, batch 150, loss[loss=0.5265, simple_loss=0.5049, pruned_loss=0.2685, over 87331.00 frames. ], tot_loss[loss=0.5629, simple_loss=0.528, pruned_loss=0.3049, over 9083152.55 frames. ], batch size: 490, lr: 4.39e-02, grad_scale: 1.0 2024-10-08 00:42:34,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=11712.0, ans=0.125 2024-10-08 00:42:41,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11712.0, ans=0.18288 2024-10-08 00:43:12,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=11952.0, ans=0.04949747468305833 2024-10-08 00:43:19,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=11952.0, ans=0.01686666666666667 2024-10-08 00:43:30,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=12072.0, ans=0.125 2024-10-08 00:43:31,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=12072.0, ans=0.125 2024-10-08 00:43:37,728 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.314e+02 9.767e+02 1.396e+03 1.809e+03 4.070e+03, threshold=2.792e+03, percent-clipped=7.0 2024-10-08 00:43:49,087 INFO [train.py:1136] (0/2) Epoch 2, batch 200, loss[loss=0.5496, simple_loss=0.5264, pruned_loss=0.2815, over 85326.00 frames. ], tot_loss[loss=0.5503, simple_loss=0.5201, pruned_loss=0.2922, over 10845153.92 frames. ], batch size: 787, lr: 4.39e-02, grad_scale: 2.0 2024-10-08 00:44:14,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=12.117 2024-10-08 00:44:26,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=12432.0, ans=0.46488 2024-10-08 00:44:44,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=4.8828 2024-10-08 00:44:47,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=12552.0, ans=0.46068000000000003 2024-10-08 00:44:55,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.00 vs. limit=8.138 2024-10-08 00:45:18,209 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.75 vs. limit=11.336 2024-10-08 00:45:22,368 INFO [train.py:1136] (0/2) Epoch 2, batch 250, loss[loss=0.46, simple_loss=0.4589, pruned_loss=0.2129, over 87057.00 frames. ], tot_loss[loss=0.5361, simple_loss=0.5105, pruned_loss=0.2794, over 12241757.23 frames. ], batch size: 350, lr: 4.39e-02, grad_scale: 1.0 2024-10-08 00:45:44,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12912.0, ans=0.125 2024-10-08 00:46:01,640 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.387 2024-10-08 00:46:16,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13032.0, ans=0.125 2024-10-08 00:46:23,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=13152.0, ans=0.125 2024-10-08 00:46:31,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=13152.0, ans=0.008010434782608697 2024-10-08 00:46:36,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=13152.0, ans=0.125 2024-10-08 00:46:52,264 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.699e+02 9.775e+02 1.288e+03 1.989e+03 4.930e+03, threshold=2.576e+03, percent-clipped=9.0 2024-10-08 00:46:59,122 INFO [train.py:1136] (0/2) Epoch 2, batch 300, loss[loss=0.4617, simple_loss=0.459, pruned_loss=0.2171, over 87213.00 frames. ], tot_loss[loss=0.5217, simple_loss=0.5008, pruned_loss=0.2668, over 13337252.48 frames. ], batch size: 280, lr: 4.39e-02, grad_scale: 2.0 2024-10-08 00:47:05,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.29 vs. limit=5.0088 2024-10-08 00:47:33,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=13512.0, ans=0.125 2024-10-08 00:47:54,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=13752.0, ans=0.05 2024-10-08 00:48:32,381 INFO [train.py:1136] (0/2) Epoch 2, batch 350, loss[loss=0.4186, simple_loss=0.4282, pruned_loss=0.1846, over 87190.00 frames. ], tot_loss[loss=0.5055, simple_loss=0.4899, pruned_loss=0.2533, over 14194573.63 frames. ], batch size: 264, lr: 4.39e-02, grad_scale: 2.0 2024-10-08 00:48:33,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=12.747 2024-10-08 00:49:13,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=14232.0, ans=0.125 2024-10-08 00:49:18,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14232.0, ans=0.15768000000000001 2024-10-08 00:49:34,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=14352.0, ans=0.125 2024-10-08 00:50:01,580 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.374e+02 1.039e+03 1.303e+03 1.855e+03 7.453e+03, threshold=2.607e+03, percent-clipped=7.0 2024-10-08 00:50:06,922 INFO [train.py:1136] (0/2) Epoch 2, batch 400, loss[loss=0.4859, simple_loss=0.4846, pruned_loss=0.2296, over 85485.00 frames. ], tot_loss[loss=0.496, simple_loss=0.4835, pruned_loss=0.2455, over 14844545.15 frames. ], batch size: 787, lr: 4.38e-02, grad_scale: 2.0 2024-10-08 00:50:39,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=14712.0, ans=0.007671304347826088 2024-10-08 00:50:44,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=14712.0, ans=0.005366666666666665 2024-10-08 00:51:17,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=14952.0, ans=0.2 2024-10-08 00:51:45,875 INFO [train.py:1136] (0/2) Epoch 2, batch 450, loss[loss=0.4284, simple_loss=0.4416, pruned_loss=0.189, over 87419.00 frames. ], tot_loss[loss=0.4874, simple_loss=0.4788, pruned_loss=0.2378, over 15311085.35 frames. ], batch size: 490, lr: 4.38e-02, grad_scale: 1.0 2024-10-08 00:51:51,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=18.894 2024-10-08 00:52:40,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=15432.0, ans=0.0023666666666666697 2024-10-08 00:52:58,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15552.0, ans=0.125 2024-10-08 00:53:11,235 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.81 vs. limit=13.376999999999999 2024-10-08 00:53:12,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=15672.0, ans=0.125 2024-10-08 00:53:19,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.001e+02 1.023e+03 1.302e+03 1.850e+03 4.735e+03, threshold=2.604e+03, percent-clipped=6.0 2024-10-08 00:53:22,784 INFO [train.py:1136] (0/2) Epoch 2, batch 500, loss[loss=0.4136, simple_loss=0.432, pruned_loss=0.1789, over 87435.00 frames. ], tot_loss[loss=0.4737, simple_loss=0.47, pruned_loss=0.2269, over 15721404.38 frames. ], batch size: 393, lr: 4.38e-02, grad_scale: 2.0 2024-10-08 00:53:27,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.42 vs. limit=13.422 2024-10-08 00:53:31,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=15792.0, ans=0.125 2024-10-08 00:53:37,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=15792.0, ans=0.007436521739130435 2024-10-08 00:53:51,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=15912.0, ans=0.34308000000000005 2024-10-08 00:54:07,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16032.0, ans=0.13968 2024-10-08 00:54:53,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=16272.0, ans=0.125 2024-10-08 00:54:59,537 INFO [train.py:1136] (0/2) Epoch 2, batch 550, loss[loss=0.4231, simple_loss=0.4432, pruned_loss=0.1836, over 86891.00 frames. ], tot_loss[loss=0.4641, simple_loss=0.4641, pruned_loss=0.2193, over 16029202.79 frames. ], batch size: 583, lr: 4.38e-02, grad_scale: 1.0 2024-10-08 00:55:01,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=16392.0, ans=0.125 2024-10-08 00:55:27,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16512.0, ans=0.13488 2024-10-08 00:56:31,491 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.127e+02 7.867e+02 1.113e+03 1.912e+03 9.117e+03, threshold=2.226e+03, percent-clipped=15.0 2024-10-08 00:56:31,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=16992.0, ans=0.0 2024-10-08 00:56:33,262 INFO [train.py:1136] (0/2) Epoch 2, batch 600, loss[loss=0.5095, simple_loss=0.5005, pruned_loss=0.2525, over 82021.00 frames. ], tot_loss[loss=0.4536, simple_loss=0.4578, pruned_loss=0.2112, over 16234082.95 frames. ], batch size: 1245, lr: 4.37e-02, grad_scale: 2.0 2024-10-08 00:56:37,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=16992.0, ans=0.125 2024-10-08 00:56:49,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=17112.0, ans=10.0 2024-10-08 00:56:56,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=17112.0, ans=0.125 2024-10-08 00:57:14,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=17232.0, ans=0.29688000000000003 2024-10-08 00:57:21,379 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 00:57:31,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=17352.0, ans=0.04949747468305833 2024-10-08 00:57:52,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=17472.0, ans=0.46208 2024-10-08 00:58:04,581 INFO [train.py:1136] (0/2) Epoch 2, batch 650, loss[loss=0.3749, simple_loss=0.404, pruned_loss=0.1566, over 87337.00 frames. ], tot_loss[loss=0.4417, simple_loss=0.4504, pruned_loss=0.2026, over 16456136.17 frames. ], batch size: 264, lr: 4.37e-02, grad_scale: 1.0 2024-10-08 00:58:16,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=17592.0, ans=0.007045217391304348 2024-10-08 00:58:42,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=17832.0, ans=0.0 2024-10-08 00:59:13,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=14.277000000000001 2024-10-08 00:59:19,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18072.0, ans=0.11928 2024-10-08 00:59:26,275 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=14.277000000000001 2024-10-08 00:59:30,223 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.271e+02 1.104e+03 1.591e+03 2.759e+03 5.605e+03, threshold=3.183e+03, percent-clipped=37.0 2024-10-08 00:59:30,242 INFO [train.py:1136] (0/2) Epoch 2, batch 700, loss[loss=0.3779, simple_loss=0.4152, pruned_loss=0.1536, over 87354.00 frames. ], tot_loss[loss=0.4317, simple_loss=0.4447, pruned_loss=0.1952, over 16618816.00 frames. ], batch size: 415, lr: 4.37e-02, grad_scale: 2.0 2024-10-08 00:59:30,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=18192.0, ans=0.125 2024-10-08 00:59:43,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=18192.0, ans=0.0069147826086956524 2024-10-08 00:59:54,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=18312.0, ans=0.0 2024-10-08 00:59:57,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=18312.0, ans=0.0 2024-10-08 01:00:00,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=18312.0, ans=0.006888695652173913 2024-10-08 01:00:15,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=18432.0, ans=0.006862608695652174 2024-10-08 01:00:22,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18552.0, ans=0.11448 2024-10-08 01:00:54,649 INFO [train.py:1136] (0/2) Epoch 2, batch 750, loss[loss=0.3797, simple_loss=0.4183, pruned_loss=0.1555, over 87326.00 frames. ], tot_loss[loss=0.4252, simple_loss=0.4411, pruned_loss=0.1907, over 16717945.14 frames. ], batch size: 464, lr: 4.37e-02, grad_scale: 2.0 2024-10-08 01:00:59,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18792.0, ans=0.11208000000000001 2024-10-08 01:01:29,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19032.0, ans=0.10968 2024-10-08 01:01:49,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=19152.0, ans=0.125 2024-10-08 01:02:09,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19272.0, ans=0.10728000000000001 2024-10-08 01:02:17,064 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 6.290e+02 9.719e+02 1.463e+03 2.016e+03 4.619e+03, threshold=2.926e+03, percent-clipped=5.0 2024-10-08 01:02:17,083 INFO [train.py:1136] (0/2) Epoch 2, batch 800, loss[loss=0.488, simple_loss=0.4887, pruned_loss=0.2378, over 78797.00 frames. ], tot_loss[loss=0.4211, simple_loss=0.4395, pruned_loss=0.188, over 16692082.81 frames. ], batch size: 1493, lr: 4.36e-02, grad_scale: 4.0 2024-10-08 01:02:27,998 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=11.7568 2024-10-08 01:02:43,554 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-2.pt 2024-10-08 01:03:41,619 INFO [train.py:1136] (0/2) Epoch 3, batch 0, loss[loss=0.3491, simple_loss=0.392, pruned_loss=0.1404, over 86590.00 frames. ], tot_loss[loss=0.3491, simple_loss=0.392, pruned_loss=0.1404, over 86590.00 frames. ], batch size: 213, lr: 4.14e-02, grad_scale: 8.0 2024-10-08 01:03:41,620 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 01:03:54,996 INFO [train.py:1168] (0/2) Epoch 3, validation: loss=0.2822, simple_loss=0.3661, pruned_loss=0.07849, over 1382211.00 frames. 2024-10-08 01:03:54,997 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 01:04:03,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19584.0, ans=0.10416 2024-10-08 01:04:26,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=19704.0, ans=0.0 2024-10-08 01:04:36,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=19824.0, ans=0.125 2024-10-08 01:04:38,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=11.9296 2024-10-08 01:04:43,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=19824.0, ans=0.05176 2024-10-08 01:04:51,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=14.979 2024-10-08 01:04:53,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=19944.0, ans=0.125 2024-10-08 01:04:58,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=19944.0, ans=0.20196000000000003 2024-10-08 01:05:22,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=20064.0, ans=0.09899494936611666 2024-10-08 01:05:26,849 INFO [train.py:1136] (0/2) Epoch 3, batch 50, loss[loss=0.335, simple_loss=0.3801, pruned_loss=0.1339, over 86193.00 frames. ], tot_loss[loss=0.3677, simple_loss=0.4078, pruned_loss=0.1525, over 3873176.78 frames. ], batch size: 197, lr: 4.14e-02, grad_scale: 2.0 2024-10-08 01:05:54,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=20304.0, ans=0.2 2024-10-08 01:05:59,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=20304.0, ans=0.5 2024-10-08 01:06:01,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=20304.0, ans=0.2 2024-10-08 01:06:33,745 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.468e+02 8.559e+02 1.104e+03 1.745e+03 5.552e+03, threshold=2.208e+03, percent-clipped=6.0 2024-10-08 01:07:05,858 INFO [train.py:1136] (0/2) Epoch 3, batch 100, loss[loss=0.4428, simple_loss=0.4605, pruned_loss=0.2067, over 82032.00 frames. ], tot_loss[loss=0.3736, simple_loss=0.4133, pruned_loss=0.1566, over 6757996.29 frames. ], batch size: 1245, lr: 4.14e-02, grad_scale: 4.0 2024-10-08 01:07:21,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=20904.0, ans=0.5 2024-10-08 01:07:25,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=20904.0, ans=0.125 2024-10-08 01:07:45,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=21024.0, ans=0.125 2024-10-08 01:08:01,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2024-10-08 01:08:41,513 INFO [train.py:1136] (0/2) Epoch 3, batch 150, loss[loss=0.4064, simple_loss=0.4457, pruned_loss=0.1766, over 84678.00 frames. ], tot_loss[loss=0.3729, simple_loss=0.4138, pruned_loss=0.1568, over 9040191.92 frames. ], batch size: 958, lr: 4.14e-02, grad_scale: 2.0 2024-10-08 01:08:49,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.71 vs. limit=5.0 2024-10-08 01:08:56,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.52 vs. limit=22.5 2024-10-08 01:09:11,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21504.0, ans=0.1 2024-10-08 01:09:21,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=21624.0, ans=0.0 2024-10-08 01:09:40,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=21744.0, ans=0.1 2024-10-08 01:09:49,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2024-10-08 01:09:50,085 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.935e+02 9.041e+02 1.823e+03 3.076e+03 7.250e+03, threshold=3.646e+03, percent-clipped=37.0 2024-10-08 01:10:01,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=12.0 2024-10-08 01:10:15,568 INFO [train.py:1136] (0/2) Epoch 3, batch 200, loss[loss=0.3426, simple_loss=0.3939, pruned_loss=0.1398, over 87415.00 frames. ], tot_loss[loss=0.3665, simple_loss=0.4101, pruned_loss=0.1531, over 10828407.08 frames. ], batch size: 296, lr: 4.13e-02, grad_scale: 4.0 2024-10-08 01:10:35,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=22104.0, ans=0.025 2024-10-08 01:11:03,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=22224.0, ans=0.0 2024-10-08 01:11:09,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.87 vs. limit=22.5 2024-10-08 01:11:19,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22344.0, ans=0.1 2024-10-08 01:11:23,831 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=15.0 2024-10-08 01:11:47,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=22584.0, ans=0.04949747468305833 2024-10-08 01:11:48,559 INFO [train.py:1136] (0/2) Epoch 3, batch 250, loss[loss=0.3397, simple_loss=0.4014, pruned_loss=0.1343, over 87512.00 frames. ], tot_loss[loss=0.3625, simple_loss=0.4086, pruned_loss=0.1508, over 12228565.37 frames. ], batch size: 490, lr: 4.13e-02, grad_scale: 4.0 2024-10-08 01:11:56,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.93 vs. limit=22.5 2024-10-08 01:12:05,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-10-08 01:12:25,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=22704.0, ans=0.125 2024-10-08 01:12:34,671 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.40 vs. limit=10.0 2024-10-08 01:12:51,929 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 01:12:53,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=22944.0, ans=0.125 2024-10-08 01:13:02,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22944.0, ans=0.1 2024-10-08 01:13:03,755 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.393e+02 8.854e+02 1.457e+03 2.089e+03 6.039e+03, threshold=2.914e+03, percent-clipped=4.0 2024-10-08 01:13:05,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=23064.0, ans=0.0 2024-10-08 01:13:17,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=23064.0, ans=0.025 2024-10-08 01:13:24,274 INFO [train.py:1136] (0/2) Epoch 3, batch 300, loss[loss=0.3334, simple_loss=0.3886, pruned_loss=0.1366, over 87204.00 frames. ], tot_loss[loss=0.3596, simple_loss=0.4079, pruned_loss=0.1495, over 13293095.76 frames. ], batch size: 264, lr: 4.13e-02, grad_scale: 4.0 2024-10-08 01:13:28,662 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.41 vs. limit=10.0 2024-10-08 01:13:49,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=23304.0, ans=0.95 2024-10-08 01:14:04,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=23424.0, ans=0.125 2024-10-08 01:14:26,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=23544.0, ans=0.1 2024-10-08 01:14:29,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.48 vs. limit=10.0 2024-10-08 01:15:00,193 INFO [train.py:1136] (0/2) Epoch 3, batch 350, loss[loss=0.3008, simple_loss=0.369, pruned_loss=0.1157, over 87056.00 frames. ], tot_loss[loss=0.3538, simple_loss=0.4053, pruned_loss=0.1463, over 14142458.11 frames. ], batch size: 264, lr: 4.12e-02, grad_scale: 4.0 2024-10-08 01:15:07,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-10-08 01:15:14,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=23784.0, ans=0.125 2024-10-08 01:15:52,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=24024.0, ans=0.125 2024-10-08 01:16:08,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24144.0, ans=0.1 2024-10-08 01:16:12,583 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.295e+02 1.031e+03 1.651e+03 2.455e+03 7.083e+03, threshold=3.303e+03, percent-clipped=14.0 2024-10-08 01:16:23,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=24264.0, ans=0.0 2024-10-08 01:16:28,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-10-08 01:16:36,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2024-10-08 01:16:36,757 INFO [train.py:1136] (0/2) Epoch 3, batch 400, loss[loss=0.4212, simple_loss=0.4485, pruned_loss=0.1969, over 69543.00 frames. ], tot_loss[loss=0.35, simple_loss=0.4033, pruned_loss=0.1447, over 14793484.12 frames. ], batch size: 1960, lr: 4.12e-02, grad_scale: 4.0 2024-10-08 01:17:06,660 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 01:17:12,409 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2024-10-08 01:17:13,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=24624.0, ans=0.125 2024-10-08 01:17:23,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=24624.0, ans=0.2 2024-10-08 01:17:54,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=24864.0, ans=0.125 2024-10-08 01:18:08,114 INFO [train.py:1136] (0/2) Epoch 3, batch 450, loss[loss=0.3775, simple_loss=0.4243, pruned_loss=0.1653, over 83140.00 frames. ], tot_loss[loss=0.3453, simple_loss=0.4008, pruned_loss=0.1421, over 15301522.76 frames. ], batch size: 1077, lr: 4.12e-02, grad_scale: 4.0 2024-10-08 01:18:25,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=24984.0, ans=0.125 2024-10-08 01:18:34,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25104.0, ans=0.1 2024-10-08 01:18:56,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=25224.0, ans=22.5 2024-10-08 01:19:10,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=25344.0, ans=0.125 2024-10-08 01:19:25,631 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.556e+02 7.623e+02 1.103e+03 1.423e+03 4.387e+03, threshold=2.205e+03, percent-clipped=1.0 2024-10-08 01:19:27,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.75 vs. limit=22.5 2024-10-08 01:19:45,346 INFO [train.py:1136] (0/2) Epoch 3, batch 500, loss[loss=0.3105, simple_loss=0.377, pruned_loss=0.122, over 87189.00 frames. ], tot_loss[loss=0.341, simple_loss=0.3989, pruned_loss=0.1395, over 15692461.12 frames. ], batch size: 330, lr: 4.11e-02, grad_scale: 8.0 2024-10-08 01:20:29,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=25824.0, ans=0.125 2024-10-08 01:21:21,698 INFO [train.py:1136] (0/2) Epoch 3, batch 550, loss[loss=0.2955, simple_loss=0.3676, pruned_loss=0.1117, over 87265.00 frames. ], tot_loss[loss=0.3382, simple_loss=0.3975, pruned_loss=0.1378, over 16008157.52 frames. ], batch size: 264, lr: 4.11e-02, grad_scale: 4.0 2024-10-08 01:21:41,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=26304.0, ans=0.125 2024-10-08 01:21:52,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=26304.0, ans=0.2 2024-10-08 01:22:03,008 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=26424.0, ans=0.1 2024-10-08 01:22:08,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26424.0, ans=0.1 2024-10-08 01:22:28,276 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.97 vs. limit=10.0 2024-10-08 01:22:33,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=26544.0, ans=0.025 2024-10-08 01:22:38,092 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.877e+02 1.042e+03 1.776e+03 2.766e+03 5.860e+03, threshold=3.551e+03, percent-clipped=41.0 2024-10-08 01:22:50,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=26664.0, ans=0.025 2024-10-08 01:22:52,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=26664.0, ans=15.0 2024-10-08 01:22:56,958 INFO [train.py:1136] (0/2) Epoch 3, batch 600, loss[loss=0.2724, simple_loss=0.3452, pruned_loss=0.09984, over 86770.00 frames. ], tot_loss[loss=0.3346, simple_loss=0.3953, pruned_loss=0.1357, over 16279520.97 frames. ], batch size: 213, lr: 4.10e-02, grad_scale: 8.0 2024-10-08 01:23:23,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2024-10-08 01:24:10,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27264.0, ans=0.1 2024-10-08 01:24:18,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=27264.0, ans=0.07 2024-10-08 01:24:25,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=27264.0, ans=0.125 2024-10-08 01:24:25,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=27264.0, ans=0.1 2024-10-08 01:24:30,125 INFO [train.py:1136] (0/2) Epoch 3, batch 650, loss[loss=0.2851, simple_loss=0.3523, pruned_loss=0.1089, over 85525.00 frames. ], tot_loss[loss=0.3306, simple_loss=0.3928, pruned_loss=0.1333, over 16490758.56 frames. ], batch size: 180, lr: 4.10e-02, grad_scale: 8.0 2024-10-08 01:24:46,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27384.0, ans=0.1 2024-10-08 01:24:47,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=27504.0, ans=0.0 2024-10-08 01:24:53,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=27504.0, ans=0.2 2024-10-08 01:25:02,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=22.5 2024-10-08 01:25:03,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=27504.0, ans=0.125 2024-10-08 01:25:42,092 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.688e+02 7.358e+02 1.073e+03 1.759e+03 4.691e+03, threshold=2.146e+03, percent-clipped=4.0 2024-10-08 01:25:47,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=27864.0, ans=0.125 2024-10-08 01:25:48,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27864.0, ans=0.1 2024-10-08 01:25:49,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2024-10-08 01:25:56,524 INFO [train.py:1136] (0/2) Epoch 3, batch 700, loss[loss=0.3013, simple_loss=0.3794, pruned_loss=0.1116, over 87379.00 frames. ], tot_loss[loss=0.3286, simple_loss=0.3919, pruned_loss=0.1319, over 16637967.28 frames. ], batch size: 464, lr: 4.10e-02, grad_scale: 8.0 2024-10-08 01:26:13,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=28104.0, ans=0.07 2024-10-08 01:26:15,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=28104.0, ans=0.05 2024-10-08 01:26:21,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=28104.0, ans=0.125 2024-10-08 01:26:21,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=28104.0, ans=0.125 2024-10-08 01:26:37,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28224.0, ans=0.125 2024-10-08 01:26:48,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=28344.0, ans=0.0 2024-10-08 01:27:02,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=28464.0, ans=0.125 2024-10-08 01:27:02,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=28464.0, ans=0.125 2024-10-08 01:27:09,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28464.0, ans=0.1 2024-10-08 01:27:11,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.96 vs. limit=15.0 2024-10-08 01:27:20,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=28584.0, ans=0.125 2024-10-08 01:27:21,300 INFO [train.py:1136] (0/2) Epoch 3, batch 750, loss[loss=0.2968, simple_loss=0.37, pruned_loss=0.1118, over 87148.00 frames. ], tot_loss[loss=0.3279, simple_loss=0.3914, pruned_loss=0.1317, over 16696396.71 frames. ], batch size: 350, lr: 4.09e-02, grad_scale: 4.0 2024-10-08 01:27:26,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=28584.0, ans=0.0 2024-10-08 01:28:13,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=28944.0, ans=0.125 2024-10-08 01:28:24,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=28944.0, ans=0.0 2024-10-08 01:28:30,305 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.799e+02 1.052e+03 1.674e+03 2.766e+03 6.209e+03, threshold=3.348e+03, percent-clipped=39.0 2024-10-08 01:28:37,003 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-10-08 01:28:44,369 INFO [train.py:1136] (0/2) Epoch 3, batch 800, loss[loss=0.338, simple_loss=0.4059, pruned_loss=0.135, over 85881.00 frames. ], tot_loss[loss=0.3299, simple_loss=0.3926, pruned_loss=0.1332, over 16720789.10 frames. ], batch size: 721, lr: 4.09e-02, grad_scale: 8.0 2024-10-08 01:28:57,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=29184.0, ans=0.2 2024-10-08 01:29:10,765 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-3.pt 2024-10-08 01:30:34,364 INFO [train.py:1136] (0/2) Epoch 4, batch 0, loss[loss=0.3521, simple_loss=0.4169, pruned_loss=0.1437, over 85961.00 frames. ], tot_loss[loss=0.3521, simple_loss=0.4169, pruned_loss=0.1437, over 85961.00 frames. ], batch size: 721, lr: 3.82e-02, grad_scale: 16.0 2024-10-08 01:30:34,366 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 01:30:45,982 INFO [train.py:1168] (0/2) Epoch 4, validation: loss=0.2285, simple_loss=0.3429, pruned_loss=0.05707, over 1382211.00 frames. 2024-10-08 01:30:45,983 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 01:31:31,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=29616.0, ans=0.125 2024-10-08 01:31:31,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=29616.0, ans=0.125 2024-10-08 01:31:53,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=29736.0, ans=0.0 2024-10-08 01:31:56,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=29736.0, ans=0.125 2024-10-08 01:32:09,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=29856.0, ans=0.125 2024-10-08 01:32:17,416 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=29856.0, ans=0.125 2024-10-08 01:32:20,347 INFO [train.py:1136] (0/2) Epoch 4, batch 50, loss[loss=0.3341, simple_loss=0.401, pruned_loss=0.1336, over 85839.00 frames. ], tot_loss[loss=0.3154, simple_loss=0.3846, pruned_loss=0.1231, over 3854049.67 frames. ], batch size: 721, lr: 3.82e-02, grad_scale: 8.0 2024-10-08 01:33:00,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=30216.0, ans=0.125 2024-10-08 01:33:03,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.21 vs. limit=15.0 2024-10-08 01:33:12,321 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.179e+02 7.030e+02 1.045e+03 1.612e+03 3.663e+03, threshold=2.090e+03, percent-clipped=3.0 2024-10-08 01:33:12,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=30216.0, ans=0.125 2024-10-08 01:33:24,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=30336.0, ans=6.0 2024-10-08 01:33:34,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=30336.0, ans=0.125 2024-10-08 01:33:56,939 INFO [train.py:1136] (0/2) Epoch 4, batch 100, loss[loss=0.2915, simple_loss=0.371, pruned_loss=0.106, over 87438.00 frames. ], tot_loss[loss=0.3136, simple_loss=0.3831, pruned_loss=0.122, over 6789570.96 frames. ], batch size: 490, lr: 3.82e-02, grad_scale: 8.0 2024-10-08 01:33:57,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=30576.0, ans=0.0 2024-10-08 01:34:01,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=30576.0, ans=0.004222608695652174 2024-10-08 01:34:04,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=30576.0, ans=0.004222608695652174 2024-10-08 01:34:09,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=30576.0, ans=0.0 2024-10-08 01:34:36,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.39 vs. limit=15.0 2024-10-08 01:34:45,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=30816.0, ans=0.1 2024-10-08 01:34:59,746 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.84 vs. limit=22.5 2024-10-08 01:35:05,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.16 vs. limit=8.0 2024-10-08 01:35:13,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=31056.0, ans=0.2 2024-10-08 01:35:30,349 INFO [train.py:1136] (0/2) Epoch 4, batch 150, loss[loss=0.3399, simple_loss=0.4103, pruned_loss=0.1347, over 85203.00 frames. ], tot_loss[loss=0.3122, simple_loss=0.3816, pruned_loss=0.1214, over 9094653.95 frames. ], batch size: 866, lr: 3.81e-02, grad_scale: 4.0 2024-10-08 01:35:31,162 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=12.0 2024-10-08 01:35:44,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2024-10-08 01:35:54,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=26.98 vs. limit=22.5 2024-10-08 01:36:04,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=31296.0, ans=0.0 2024-10-08 01:36:10,428 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.71 vs. limit=22.5 2024-10-08 01:36:23,004 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.370e+02 8.056e+02 1.059e+03 1.922e+03 5.249e+03, threshold=2.118e+03, percent-clipped=23.0 2024-10-08 01:36:43,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=31536.0, ans=0.0 2024-10-08 01:37:06,721 INFO [train.py:1136] (0/2) Epoch 4, batch 200, loss[loss=0.2839, simple_loss=0.365, pruned_loss=0.1014, over 87365.00 frames. ], tot_loss[loss=0.3114, simple_loss=0.3814, pruned_loss=0.1208, over 10872081.84 frames. ], batch size: 439, lr: 3.81e-02, grad_scale: 8.0 2024-10-08 01:37:14,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=31776.0, ans=0.003961739130434783 2024-10-08 01:37:28,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=31896.0, ans=0.2 2024-10-08 01:37:40,559 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=12.0 2024-10-08 01:37:43,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=32016.0, ans=0.05 2024-10-08 01:38:40,398 INFO [train.py:1136] (0/2) Epoch 4, batch 250, loss[loss=0.2857, simple_loss=0.3657, pruned_loss=0.1029, over 87363.00 frames. ], tot_loss[loss=0.3095, simple_loss=0.38, pruned_loss=0.1196, over 12274678.55 frames. ], batch size: 464, lr: 3.80e-02, grad_scale: 4.0 2024-10-08 01:39:09,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=32496.0, ans=0.125 2024-10-08 01:39:36,034 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.713e+02 8.632e+02 1.176e+03 1.982e+03 4.478e+03, threshold=2.353e+03, percent-clipped=22.0 2024-10-08 01:40:16,662 INFO [train.py:1136] (0/2) Epoch 4, batch 300, loss[loss=0.2752, simple_loss=0.3503, pruned_loss=0.1, over 87269.00 frames. ], tot_loss[loss=0.3071, simple_loss=0.3777, pruned_loss=0.1183, over 13305659.29 frames. ], batch size: 264, lr: 3.80e-02, grad_scale: 8.0 2024-10-08 01:40:18,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32976.0, ans=0.1 2024-10-08 01:41:04,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=33216.0, ans=0.025 2024-10-08 01:41:25,916 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-10-08 01:41:51,279 INFO [train.py:1136] (0/2) Epoch 4, batch 350, loss[loss=0.2839, simple_loss=0.356, pruned_loss=0.1059, over 86094.00 frames. ], tot_loss[loss=0.3056, simple_loss=0.377, pruned_loss=0.1171, over 14172093.39 frames. ], batch size: 197, lr: 3.80e-02, grad_scale: 8.0 2024-10-08 01:42:04,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=33576.0, ans=0.125 2024-10-08 01:42:48,873 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.857e+02 7.248e+02 9.259e+02 1.450e+03 3.617e+03, threshold=1.852e+03, percent-clipped=9.0 2024-10-08 01:42:59,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=33936.0, ans=0.003492173913043478 2024-10-08 01:42:59,633 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 01:43:28,029 INFO [train.py:1136] (0/2) Epoch 4, batch 400, loss[loss=0.3633, simple_loss=0.4225, pruned_loss=0.1521, over 83486.00 frames. ], tot_loss[loss=0.3062, simple_loss=0.3775, pruned_loss=0.1175, over 14805944.50 frames. ], batch size: 1077, lr: 3.79e-02, grad_scale: 16.0 2024-10-08 01:43:36,136 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.66 vs. limit=15.0 2024-10-08 01:44:11,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=34416.0, ans=0.0 2024-10-08 01:44:24,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=34536.0, ans=0.0 2024-10-08 01:45:03,006 INFO [train.py:1136] (0/2) Epoch 4, batch 450, loss[loss=0.2798, simple_loss=0.3535, pruned_loss=0.103, over 86573.00 frames. ], tot_loss[loss=0.3059, simple_loss=0.3774, pruned_loss=0.1171, over 15320427.04 frames. ], batch size: 229, lr: 3.79e-02, grad_scale: 8.0 2024-10-08 01:45:07,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2024-10-08 01:45:08,687 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 01:45:12,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2024-10-08 01:45:33,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=34896.0, ans=0.125 2024-10-08 01:45:55,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35136.0, ans=0.1 2024-10-08 01:45:59,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.717e+02 7.799e+02 1.171e+03 1.619e+03 4.619e+03, threshold=2.343e+03, percent-clipped=19.0 2024-10-08 01:46:05,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=35136.0, ans=0.2 2024-10-08 01:46:07,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=35136.0, ans=0.1 2024-10-08 01:46:16,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=35136.0, ans=0.125 2024-10-08 01:46:16,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2024-10-08 01:46:36,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-10-08 01:46:36,645 INFO [train.py:1136] (0/2) Epoch 4, batch 500, loss[loss=0.2713, simple_loss=0.3478, pruned_loss=0.09736, over 86685.00 frames. ], tot_loss[loss=0.3059, simple_loss=0.3774, pruned_loss=0.1172, over 15713607.33 frames. ], batch size: 229, lr: 3.78e-02, grad_scale: 8.0 2024-10-08 01:46:56,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=35496.0, ans=15.0 2024-10-08 01:47:02,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=35496.0, ans=0.035 2024-10-08 01:47:16,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=35616.0, ans=0.125 2024-10-08 01:48:06,296 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-10-08 01:48:13,923 INFO [train.py:1136] (0/2) Epoch 4, batch 550, loss[loss=0.2698, simple_loss=0.3529, pruned_loss=0.09332, over 87255.00 frames. ], tot_loss[loss=0.3057, simple_loss=0.3772, pruned_loss=0.1171, over 15990301.64 frames. ], batch size: 350, lr: 3.78e-02, grad_scale: 8.0 2024-10-08 01:48:33,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=36096.0, ans=0.125 2024-10-08 01:48:49,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=36216.0, ans=0.125 2024-10-08 01:49:10,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.322e+02 7.462e+02 1.025e+03 1.619e+03 2.709e+03, threshold=2.051e+03, percent-clipped=11.0 2024-10-08 01:49:16,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=36336.0, ans=0.125 2024-10-08 01:49:22,199 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-10-08 01:49:38,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2024-10-08 01:49:38,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.73 vs. limit=12.0 2024-10-08 01:49:43,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=36456.0, ans=0.125 2024-10-08 01:49:46,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=36576.0, ans=0.0 2024-10-08 01:49:48,133 INFO [train.py:1136] (0/2) Epoch 4, batch 600, loss[loss=0.273, simple_loss=0.3535, pruned_loss=0.09621, over 87425.00 frames. ], tot_loss[loss=0.3024, simple_loss=0.3748, pruned_loss=0.115, over 16249973.54 frames. ], batch size: 393, lr: 3.77e-02, grad_scale: 8.0 2024-10-08 01:50:12,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=36696.0, ans=0.125 2024-10-08 01:50:31,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=12.0 2024-10-08 01:50:58,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=36936.0, ans=0.125 2024-10-08 01:50:59,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=36936.0, ans=0.1 2024-10-08 01:51:05,226 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 01:51:24,319 INFO [train.py:1136] (0/2) Epoch 4, batch 650, loss[loss=0.2693, simple_loss=0.3418, pruned_loss=0.09837, over 86195.00 frames. ], tot_loss[loss=0.3009, simple_loss=0.3734, pruned_loss=0.1142, over 16447414.98 frames. ], batch size: 197, lr: 3.77e-02, grad_scale: 8.0 2024-10-08 01:51:51,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=37296.0, ans=0.125 2024-10-08 01:51:53,567 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-10-08 01:52:01,032 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=15.0 2024-10-08 01:52:04,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.04 vs. limit=12.0 2024-10-08 01:52:20,975 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.224e+02 8.544e+02 1.229e+03 1.998e+03 6.387e+03, threshold=2.459e+03, percent-clipped=23.0 2024-10-08 01:52:32,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=37536.0, ans=0.125 2024-10-08 01:52:40,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-10-08 01:52:45,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=37656.0, ans=0.002683478260869565 2024-10-08 01:52:48,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=37656.0, ans=0.125 2024-10-08 01:52:50,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=12.0 2024-10-08 01:52:53,140 INFO [train.py:1136] (0/2) Epoch 4, batch 700, loss[loss=0.2869, simple_loss=0.3698, pruned_loss=0.102, over 87125.00 frames. ], tot_loss[loss=0.2992, simple_loss=0.3722, pruned_loss=0.1131, over 16613690.99 frames. ], batch size: 517, lr: 3.77e-02, grad_scale: 8.0 2024-10-08 01:53:11,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=37896.0, ans=0.125 2024-10-08 01:53:30,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-10-08 01:53:38,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=38016.0, ans=0.0 2024-10-08 01:53:38,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=38016.0, ans=0.0 2024-10-08 01:54:05,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=38256.0, ans=0.0 2024-10-08 01:54:17,960 INFO [train.py:1136] (0/2) Epoch 4, batch 750, loss[loss=0.3148, simple_loss=0.3913, pruned_loss=0.1192, over 85560.00 frames. ], tot_loss[loss=0.2999, simple_loss=0.3728, pruned_loss=0.1135, over 16688462.42 frames. ], batch size: 787, lr: 3.76e-02, grad_scale: 8.0 2024-10-08 01:54:32,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=38496.0, ans=0.025 2024-10-08 01:54:35,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=38496.0, ans=0.125 2024-10-08 01:54:48,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=38616.0, ans=0.2 2024-10-08 01:55:08,398 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.487e+02 8.900e+02 1.369e+03 2.028e+03 5.114e+03, threshold=2.738e+03, percent-clipped=17.0 2024-10-08 01:55:15,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=38736.0, ans=0.05 2024-10-08 01:55:41,684 INFO [train.py:1136] (0/2) Epoch 4, batch 800, loss[loss=0.2647, simple_loss=0.3471, pruned_loss=0.09119, over 87107.00 frames. ], tot_loss[loss=0.3028, simple_loss=0.375, pruned_loss=0.1152, over 16710926.59 frames. ], batch size: 350, lr: 3.76e-02, grad_scale: 16.0 2024-10-08 01:56:08,743 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-4.pt 2024-10-08 01:57:07,892 INFO [train.py:1136] (0/2) Epoch 5, batch 0, loss[loss=0.3115, simple_loss=0.3903, pruned_loss=0.1163, over 85931.00 frames. ], tot_loss[loss=0.3115, simple_loss=0.3903, pruned_loss=0.1163, over 85931.00 frames. ], batch size: 721, lr: 3.50e-02, grad_scale: 32.0 2024-10-08 01:57:07,893 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 01:57:18,801 INFO [train.py:1168] (0/2) Epoch 5, validation: loss=0.2155, simple_loss=0.3322, pruned_loss=0.04944, over 1382211.00 frames. 2024-10-08 01:57:18,802 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 01:57:33,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=39168.0, ans=22.5 2024-10-08 01:57:44,547 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=39288.0, ans=0.0 2024-10-08 01:58:07,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=39408.0, ans=0.125 2024-10-08 01:58:14,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=39528.0, ans=0.125 2024-10-08 01:58:45,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2024-10-08 01:58:49,771 INFO [train.py:1136] (0/2) Epoch 5, batch 50, loss[loss=0.3044, simple_loss=0.3801, pruned_loss=0.1143, over 86430.00 frames. ], tot_loss[loss=0.2892, simple_loss=0.3664, pruned_loss=0.106, over 3872435.02 frames. ], batch size: 620, lr: 3.49e-02, grad_scale: 32.0 2024-10-08 01:59:16,747 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.671e+02 7.282e+02 9.846e+02 1.480e+03 6.238e+03, threshold=1.969e+03, percent-clipped=6.0 2024-10-08 01:59:50,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=40128.0, ans=0.2 2024-10-08 01:59:55,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=40128.0, ans=0.2 2024-10-08 02:00:14,307 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2024-10-08 02:00:19,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=40248.0, ans=0.125 2024-10-08 02:00:19,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=40248.0, ans=0.0 2024-10-08 02:00:22,285 INFO [train.py:1136] (0/2) Epoch 5, batch 100, loss[loss=0.269, simple_loss=0.346, pruned_loss=0.09599, over 87271.00 frames. ], tot_loss[loss=0.293, simple_loss=0.3692, pruned_loss=0.1085, over 6767953.88 frames. ], batch size: 280, lr: 3.49e-02, grad_scale: 32.0 2024-10-08 02:00:37,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=40368.0, ans=0.125 2024-10-08 02:00:48,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=40488.0, ans=0.125 2024-10-08 02:00:52,374 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.88 vs. limit=22.5 2024-10-08 02:00:53,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2024-10-08 02:01:12,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=40608.0, ans=0.025 2024-10-08 02:01:26,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=40728.0, ans=0.0 2024-10-08 02:02:00,452 INFO [train.py:1136] (0/2) Epoch 5, batch 150, loss[loss=0.2567, simple_loss=0.3386, pruned_loss=0.08742, over 86708.00 frames. ], tot_loss[loss=0.2909, simple_loss=0.3675, pruned_loss=0.1072, over 9079154.46 frames. ], batch size: 229, lr: 3.48e-02, grad_scale: 32.0 2024-10-08 02:02:05,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=40968.0, ans=0.125 2024-10-08 02:02:18,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41088.0, ans=0.1 2024-10-08 02:02:25,411 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 02:02:26,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.605e+02 7.261e+02 9.589e+02 1.358e+03 2.390e+03, threshold=1.918e+03, percent-clipped=9.0 2024-10-08 02:02:42,352 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.93 vs. limit=22.5 2024-10-08 02:02:46,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41208.0, ans=0.1 2024-10-08 02:02:52,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41208.0, ans=0.1 2024-10-08 02:02:57,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=41328.0, ans=0.07 2024-10-08 02:03:24,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=41448.0, ans=0.035 2024-10-08 02:03:36,718 INFO [train.py:1136] (0/2) Epoch 5, batch 200, loss[loss=0.2641, simple_loss=0.3516, pruned_loss=0.08832, over 87354.00 frames. ], tot_loss[loss=0.2896, simple_loss=0.3665, pruned_loss=0.1064, over 10879245.27 frames. ], batch size: 415, lr: 3.48e-02, grad_scale: 32.0 2024-10-08 02:03:38,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=41568.0, ans=0.125 2024-10-08 02:03:42,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=41568.0, ans=0.0018330434782608696 2024-10-08 02:05:11,624 INFO [train.py:1136] (0/2) Epoch 5, batch 250, loss[loss=0.2737, simple_loss=0.3553, pruned_loss=0.09607, over 87389.00 frames. ], tot_loss[loss=0.2871, simple_loss=0.3644, pruned_loss=0.1049, over 12276588.54 frames. ], batch size: 393, lr: 3.47e-02, grad_scale: 32.0 2024-10-08 02:05:38,267 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.263e+02 7.575e+02 9.572e+02 1.413e+03 3.815e+03, threshold=1.914e+03, percent-clipped=5.0 2024-10-08 02:06:08,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=42528.0, ans=0.125 2024-10-08 02:06:11,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=42528.0, ans=0.125 2024-10-08 02:06:47,346 INFO [train.py:1136] (0/2) Epoch 5, batch 300, loss[loss=0.2948, simple_loss=0.3726, pruned_loss=0.1085, over 86357.00 frames. ], tot_loss[loss=0.2855, simple_loss=0.363, pruned_loss=0.104, over 13358790.09 frames. ], batch size: 667, lr: 3.47e-02, grad_scale: 16.0 2024-10-08 02:07:00,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=42768.0, ans=0.2 2024-10-08 02:07:12,346 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=12.0 2024-10-08 02:08:00,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=43248.0, ans=0.125 2024-10-08 02:08:23,962 INFO [train.py:1136] (0/2) Epoch 5, batch 350, loss[loss=0.3776, simple_loss=0.4275, pruned_loss=0.1639, over 78603.00 frames. ], tot_loss[loss=0.285, simple_loss=0.3626, pruned_loss=0.1037, over 14197402.13 frames. ], batch size: 1493, lr: 3.47e-02, grad_scale: 16.0 2024-10-08 02:08:45,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=43488.0, ans=0.025 2024-10-08 02:08:50,500 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.659e+02 7.447e+02 9.896e+02 1.768e+03 4.397e+03, threshold=1.979e+03, percent-clipped=22.0 2024-10-08 02:09:06,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=43608.0, ans=0.125 2024-10-08 02:10:01,499 INFO [train.py:1136] (0/2) Epoch 5, batch 400, loss[loss=0.2747, simple_loss=0.3598, pruned_loss=0.09477, over 87343.00 frames. ], tot_loss[loss=0.286, simple_loss=0.3632, pruned_loss=0.1044, over 14799543.84 frames. ], batch size: 415, lr: 3.46e-02, grad_scale: 32.0 2024-10-08 02:10:05,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=43968.0, ans=0.125 2024-10-08 02:10:07,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=43968.0, ans=0.2 2024-10-08 02:10:15,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=43968.0, ans=0.125 2024-10-08 02:10:46,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=44208.0, ans=0.125 2024-10-08 02:11:25,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=44448.0, ans=0.0012069565217391297 2024-10-08 02:11:35,671 INFO [train.py:1136] (0/2) Epoch 5, batch 450, loss[loss=0.2618, simple_loss=0.3482, pruned_loss=0.08766, over 87142.00 frames. ], tot_loss[loss=0.2844, simple_loss=0.3619, pruned_loss=0.1035, over 15308304.08 frames. ], batch size: 439, lr: 3.46e-02, grad_scale: 16.0 2024-10-08 02:12:05,989 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.922e+02 8.450e+02 1.163e+03 1.888e+03 3.829e+03, threshold=2.327e+03, percent-clipped=24.0 2024-10-08 02:12:06,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=44688.0, ans=0.125 2024-10-08 02:12:12,055 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=44808.0, ans=0.125 2024-10-08 02:12:46,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=44928.0, ans=0.0 2024-10-08 02:13:02,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=45048.0, ans=0.07 2024-10-08 02:13:10,507 INFO [train.py:1136] (0/2) Epoch 5, batch 500, loss[loss=0.2444, simple_loss=0.3257, pruned_loss=0.08156, over 85583.00 frames. ], tot_loss[loss=0.2845, simple_loss=0.3618, pruned_loss=0.1036, over 15697359.76 frames. ], batch size: 180, lr: 3.45e-02, grad_scale: 16.0 2024-10-08 02:13:18,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=45168.0, ans=0.125 2024-10-08 02:13:39,629 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-10-08 02:13:46,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45288.0, ans=0.1 2024-10-08 02:14:15,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-10-08 02:14:18,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=45528.0, ans=0.0 2024-10-08 02:14:29,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=45648.0, ans=0.000946086956521739 2024-10-08 02:14:47,874 INFO [train.py:1136] (0/2) Epoch 5, batch 550, loss[loss=0.2479, simple_loss=0.3291, pruned_loss=0.08337, over 86469.00 frames. ], tot_loss[loss=0.2832, simple_loss=0.3611, pruned_loss=0.1026, over 16017568.22 frames. ], batch size: 213, lr: 3.45e-02, grad_scale: 16.0 2024-10-08 02:15:00,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=45768.0, ans=0.025 2024-10-08 02:15:20,095 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.687e+02 6.672e+02 8.739e+02 1.089e+03 3.176e+03, threshold=1.748e+03, percent-clipped=3.0 2024-10-08 02:15:26,651 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.57 vs. limit=15.0 2024-10-08 02:16:23,721 INFO [train.py:1136] (0/2) Epoch 5, batch 600, loss[loss=0.2597, simple_loss=0.3463, pruned_loss=0.08659, over 87331.00 frames. ], tot_loss[loss=0.283, simple_loss=0.3608, pruned_loss=0.1026, over 16267154.23 frames. ], batch size: 372, lr: 3.44e-02, grad_scale: 16.0 2024-10-08 02:16:35,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=46368.0, ans=0.05 2024-10-08 02:16:44,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=46488.0, ans=0.125 2024-10-08 02:17:15,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46608.0, ans=0.1 2024-10-08 02:17:48,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=46848.0, ans=0.2 2024-10-08 02:17:59,098 INFO [train.py:1136] (0/2) Epoch 5, batch 650, loss[loss=0.3122, simple_loss=0.3855, pruned_loss=0.1195, over 83203.00 frames. ], tot_loss[loss=0.2807, simple_loss=0.3588, pruned_loss=0.1013, over 16475934.61 frames. ], batch size: 1077, lr: 3.44e-02, grad_scale: 16.0 2024-10-08 02:18:00,227 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2024-10-08 02:18:11,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=46968.0, ans=0.125 2024-10-08 02:18:25,693 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.818e+02 7.508e+02 1.091e+03 1.804e+03 6.194e+03, threshold=2.183e+03, percent-clipped=28.0 2024-10-08 02:18:25,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=47088.0, ans=0.125 2024-10-08 02:19:09,890 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=47448.0, ans=0.125 2024-10-08 02:19:22,385 INFO [train.py:1136] (0/2) Epoch 5, batch 700, loss[loss=0.2519, simple_loss=0.3345, pruned_loss=0.08468, over 87244.00 frames. ], tot_loss[loss=0.2797, simple_loss=0.3578, pruned_loss=0.1008, over 16629721.17 frames. ], batch size: 264, lr: 3.43e-02, grad_scale: 16.0 2024-10-08 02:19:49,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=47688.0, ans=0.125 2024-10-08 02:20:00,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=47808.0, ans=0.125 2024-10-08 02:20:01,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=47808.0, ans=0.125 2024-10-08 02:20:13,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=47928.0, ans=0.125 2024-10-08 02:20:22,828 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-4000.pt 2024-10-08 02:20:51,065 INFO [train.py:1136] (0/2) Epoch 5, batch 750, loss[loss=0.244, simple_loss=0.3305, pruned_loss=0.07879, over 86610.00 frames. ], tot_loss[loss=0.2803, simple_loss=0.3582, pruned_loss=0.1012, over 16711491.72 frames. ], batch size: 229, lr: 3.43e-02, grad_scale: 16.0 2024-10-08 02:21:02,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=48168.0, ans=0.0 2024-10-08 02:21:09,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=48288.0, ans=0.125 2024-10-08 02:21:17,892 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.244e+02 7.573e+02 1.038e+03 1.754e+03 3.833e+03, threshold=2.076e+03, percent-clipped=14.0 2024-10-08 02:21:19,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=48288.0, ans=0.125 2024-10-08 02:21:34,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=48408.0, ans=0.125 2024-10-08 02:21:43,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=48528.0, ans=0.125 2024-10-08 02:21:44,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48528.0, ans=0.1 2024-10-08 02:21:46,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=48528.0, ans=0.0003199999999999991 2024-10-08 02:21:51,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-10-08 02:22:10,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.70 vs. limit=22.5 2024-10-08 02:22:14,336 INFO [train.py:1136] (0/2) Epoch 5, batch 800, loss[loss=0.3694, simple_loss=0.422, pruned_loss=0.1584, over 78825.00 frames. ], tot_loss[loss=0.2798, simple_loss=0.3578, pruned_loss=0.1009, over 16779625.66 frames. ], batch size: 1493, lr: 3.42e-02, grad_scale: 32.0 2024-10-08 02:22:26,187 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2024-10-08 02:22:41,692 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-5.pt 2024-10-08 02:23:39,703 INFO [train.py:1136] (0/2) Epoch 6, batch 0, loss[loss=0.2539, simple_loss=0.3395, pruned_loss=0.08416, over 87292.00 frames. ], tot_loss[loss=0.2539, simple_loss=0.3395, pruned_loss=0.08416, over 87292.00 frames. ], batch size: 350, lr: 3.19e-02, grad_scale: 32.0 2024-10-08 02:23:39,704 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 02:23:50,566 INFO [train.py:1168] (0/2) Epoch 6, validation: loss=0.2059, simple_loss=0.3226, pruned_loss=0.04464, over 1382211.00 frames. 2024-10-08 02:23:50,566 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 02:24:06,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2024-10-08 02:24:11,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=49080.0, ans=0.025 2024-10-08 02:24:23,988 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=22.5 2024-10-08 02:24:37,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=49200.0, ans=12.0 2024-10-08 02:24:38,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=49200.0, ans=0.0 2024-10-08 02:24:47,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49320.0, ans=0.1 2024-10-08 02:24:47,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.13 vs. limit=12.0 2024-10-08 02:25:00,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=49320.0, ans=0.00014782608695652205 2024-10-08 02:25:06,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=49440.0, ans=0.1 2024-10-08 02:25:24,957 INFO [train.py:1136] (0/2) Epoch 6, batch 50, loss[loss=0.2606, simple_loss=0.3477, pruned_loss=0.08674, over 87196.00 frames. ], tot_loss[loss=0.2707, simple_loss=0.3523, pruned_loss=0.09461, over 3876083.44 frames. ], batch size: 439, lr: 3.19e-02, grad_scale: 16.0 2024-10-08 02:25:28,123 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.966e+02 8.397e+02 1.173e+03 1.622e+03 3.924e+03, threshold=2.347e+03, percent-clipped=11.0 2024-10-08 02:25:42,564 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2024-10-08 02:25:53,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=49680.0, ans=0.1 2024-10-08 02:25:53,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=49680.0, ans=0.125 2024-10-08 02:25:56,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=22.5 2024-10-08 02:26:02,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=49800.0, ans=0.2 2024-10-08 02:26:38,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=50040.0, ans=0.125 2024-10-08 02:26:54,770 INFO [train.py:1136] (0/2) Epoch 6, batch 100, loss[loss=0.2537, simple_loss=0.3362, pruned_loss=0.08558, over 87186.00 frames. ], tot_loss[loss=0.2701, simple_loss=0.3512, pruned_loss=0.09444, over 6821233.94 frames. ], batch size: 330, lr: 3.18e-02, grad_scale: 16.0 2024-10-08 02:27:59,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=50520.0, ans=0.0 2024-10-08 02:28:17,799 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2024-10-08 02:28:22,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=50640.0, ans=0.125 2024-10-08 02:28:22,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=50640.0, ans=0.125 2024-10-08 02:28:32,659 INFO [train.py:1136] (0/2) Epoch 6, batch 150, loss[loss=0.244, simple_loss=0.332, pruned_loss=0.07803, over 87019.00 frames. ], tot_loss[loss=0.2687, simple_loss=0.3503, pruned_loss=0.09353, over 9117002.51 frames. ], batch size: 350, lr: 3.18e-02, grad_scale: 16.0 2024-10-08 02:28:33,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=50760.0, ans=0.0 2024-10-08 02:28:36,053 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.388e+02 5.893e+02 8.325e+02 1.079e+03 1.904e+03, threshold=1.665e+03, percent-clipped=0.0 2024-10-08 02:29:14,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=51000.0, ans=0.0 2024-10-08 02:30:10,238 INFO [train.py:1136] (0/2) Epoch 6, batch 200, loss[loss=0.3361, simple_loss=0.3928, pruned_loss=0.1397, over 69599.00 frames. ], tot_loss[loss=0.2723, simple_loss=0.3524, pruned_loss=0.09611, over 10843352.61 frames. ], batch size: 1960, lr: 3.18e-02, grad_scale: 16.0 2024-10-08 02:30:11,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2024-10-08 02:30:12,517 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=51360.0, ans=0.05 2024-10-08 02:30:17,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=51360.0, ans=0.125 2024-10-08 02:30:42,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=51480.0, ans=0.0 2024-10-08 02:30:46,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=51480.0, ans=0.0 2024-10-08 02:30:48,716 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2024-10-08 02:31:09,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=51720.0, ans=0.0 2024-10-08 02:31:25,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=51840.0, ans=0.125 2024-10-08 02:31:45,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=51960.0, ans=0.2 2024-10-08 02:31:46,321 INFO [train.py:1136] (0/2) Epoch 6, batch 250, loss[loss=0.2885, simple_loss=0.3677, pruned_loss=0.1047, over 85940.00 frames. ], tot_loss[loss=0.2724, simple_loss=0.3525, pruned_loss=0.09615, over 12246429.71 frames. ], batch size: 721, lr: 3.17e-02, grad_scale: 16.0 2024-10-08 02:31:50,036 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.821e+02 6.978e+02 9.662e+02 1.380e+03 3.742e+03, threshold=1.932e+03, percent-clipped=19.0 2024-10-08 02:31:53,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=51960.0, ans=0.125 2024-10-08 02:32:25,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=52200.0, ans=0.125 2024-10-08 02:32:41,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-10-08 02:32:45,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=52320.0, ans=0.05 2024-10-08 02:32:59,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=52440.0, ans=0.0 2024-10-08 02:33:14,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=52440.0, ans=0.125 2024-10-08 02:33:20,745 INFO [train.py:1136] (0/2) Epoch 6, batch 300, loss[loss=0.3348, simple_loss=0.3921, pruned_loss=0.1387, over 69216.00 frames. ], tot_loss[loss=0.2704, simple_loss=0.3511, pruned_loss=0.09484, over 13313299.75 frames. ], batch size: 1960, lr: 3.17e-02, grad_scale: 16.0 2024-10-08 02:34:11,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52800.0, ans=0.1 2024-10-08 02:34:19,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=52920.0, ans=0.125 2024-10-08 02:34:21,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=52920.0, ans=0.125 2024-10-08 02:34:49,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=53040.0, ans=0.125 2024-10-08 02:34:57,760 INFO [train.py:1136] (0/2) Epoch 6, batch 350, loss[loss=0.2522, simple_loss=0.3418, pruned_loss=0.0813, over 87444.00 frames. ], tot_loss[loss=0.2714, simple_loss=0.3517, pruned_loss=0.09557, over 14118761.96 frames. ], batch size: 464, lr: 3.16e-02, grad_scale: 16.0 2024-10-08 02:35:01,397 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.420e+02 6.580e+02 9.619e+02 1.365e+03 4.500e+03, threshold=1.924e+03, percent-clipped=10.0 2024-10-08 02:35:15,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=53160.0, ans=0.0 2024-10-08 02:35:48,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=53400.0, ans=0.025 2024-10-08 02:35:56,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=53520.0, ans=0.125 2024-10-08 02:36:03,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=53520.0, ans=6.0 2024-10-08 02:36:03,700 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2024-10-08 02:36:18,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=53640.0, ans=0.125 2024-10-08 02:36:33,512 INFO [train.py:1136] (0/2) Epoch 6, batch 400, loss[loss=0.2589, simple_loss=0.3444, pruned_loss=0.08668, over 87215.00 frames. ], tot_loss[loss=0.2716, simple_loss=0.3522, pruned_loss=0.09544, over 14779626.33 frames. ], batch size: 393, lr: 3.16e-02, grad_scale: 32.0 2024-10-08 02:36:34,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=53760.0, ans=10.0 2024-10-08 02:37:00,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=53880.0, ans=0.125 2024-10-08 02:37:05,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=53880.0, ans=0.125 2024-10-08 02:37:12,815 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 02:37:35,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-10-08 02:37:39,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=54120.0, ans=0.0 2024-10-08 02:37:51,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-10-08 02:38:09,641 INFO [train.py:1136] (0/2) Epoch 6, batch 450, loss[loss=0.255, simple_loss=0.3368, pruned_loss=0.0866, over 87281.00 frames. ], tot_loss[loss=0.2726, simple_loss=0.3533, pruned_loss=0.09595, over 15259460.23 frames. ], batch size: 264, lr: 3.15e-02, grad_scale: 32.0 2024-10-08 02:38:13,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.636e+02 6.850e+02 8.583e+02 1.262e+03 2.609e+03, threshold=1.717e+03, percent-clipped=6.0 2024-10-08 02:38:32,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=54480.0, ans=0.2 2024-10-08 02:38:40,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=54480.0, ans=0.125 2024-10-08 02:38:50,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=54600.0, ans=0.125 2024-10-08 02:38:56,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.85 vs. limit=5.0 2024-10-08 02:39:14,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=54720.0, ans=0.2 2024-10-08 02:39:28,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=54840.0, ans=0.125 2024-10-08 02:39:31,140 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.99 vs. limit=15.0 2024-10-08 02:39:31,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54840.0, ans=0.1 2024-10-08 02:39:43,654 INFO [train.py:1136] (0/2) Epoch 6, batch 500, loss[loss=0.2512, simple_loss=0.3343, pruned_loss=0.08406, over 87212.00 frames. ], tot_loss[loss=0.2727, simple_loss=0.3532, pruned_loss=0.09604, over 15640132.98 frames. ], batch size: 280, lr: 3.15e-02, grad_scale: 16.0 2024-10-08 02:39:58,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.04 vs. limit=10.0 2024-10-08 02:40:30,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=55200.0, ans=0.2 2024-10-08 02:40:38,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=55320.0, ans=0.0 2024-10-08 02:40:41,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=55320.0, ans=0.0 2024-10-08 02:41:18,283 INFO [train.py:1136] (0/2) Epoch 6, batch 550, loss[loss=0.25, simple_loss=0.3288, pruned_loss=0.08559, over 87209.00 frames. ], tot_loss[loss=0.2716, simple_loss=0.3521, pruned_loss=0.09554, over 15967336.86 frames. ], batch size: 264, lr: 3.14e-02, grad_scale: 16.0 2024-10-08 02:41:21,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=55560.0, ans=0.2 2024-10-08 02:41:23,142 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.286e+02 7.274e+02 1.230e+03 1.832e+03 3.735e+03, threshold=2.459e+03, percent-clipped=28.0 2024-10-08 02:41:38,385 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2024-10-08 02:42:23,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=55920.0, ans=0.0 2024-10-08 02:42:42,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=56040.0, ans=0.07 2024-10-08 02:42:55,120 INFO [train.py:1136] (0/2) Epoch 6, batch 600, loss[loss=0.2451, simple_loss=0.3228, pruned_loss=0.08367, over 85685.00 frames. ], tot_loss[loss=0.2698, simple_loss=0.3507, pruned_loss=0.09449, over 16230993.63 frames. ], batch size: 180, lr: 3.14e-02, grad_scale: 16.0 2024-10-08 02:43:28,498 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=56280.0, ans=0.125 2024-10-08 02:43:30,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56400.0, ans=0.1 2024-10-08 02:43:31,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=56400.0, ans=0.0 2024-10-08 02:43:42,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=56400.0, ans=0.125 2024-10-08 02:44:05,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=56520.0, ans=0.0 2024-10-08 02:44:29,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-10-08 02:44:30,109 INFO [train.py:1136] (0/2) Epoch 6, batch 650, loss[loss=0.2611, simple_loss=0.3433, pruned_loss=0.08947, over 87285.00 frames. ], tot_loss[loss=0.2692, simple_loss=0.35, pruned_loss=0.09419, over 16381860.93 frames. ], batch size: 280, lr: 3.13e-02, grad_scale: 8.0 2024-10-08 02:44:36,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.546e+02 6.265e+02 8.278e+02 1.149e+03 2.996e+03, threshold=1.656e+03, percent-clipped=4.0 2024-10-08 02:44:46,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=56880.0, ans=0.1 2024-10-08 02:44:46,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=56880.0, ans=0.1 2024-10-08 02:44:58,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-10-08 02:45:16,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=57120.0, ans=0.125 2024-10-08 02:45:34,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2024-10-08 02:45:37,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.11 vs. limit=22.5 2024-10-08 02:45:50,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.88 vs. limit=10.0 2024-10-08 02:45:50,971 INFO [train.py:1136] (0/2) Epoch 6, batch 700, loss[loss=0.2478, simple_loss=0.3352, pruned_loss=0.08015, over 87342.00 frames. ], tot_loss[loss=0.2673, simple_loss=0.3489, pruned_loss=0.0929, over 16565695.96 frames. ], batch size: 393, lr: 3.13e-02, grad_scale: 8.0 2024-10-08 02:46:14,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=57480.0, ans=0.0 2024-10-08 02:46:26,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=57600.0, ans=0.125 2024-10-08 02:46:50,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=57720.0, ans=0.0 2024-10-08 02:47:14,931 INFO [train.py:1136] (0/2) Epoch 6, batch 750, loss[loss=0.2401, simple_loss=0.3317, pruned_loss=0.07423, over 87542.00 frames. ], tot_loss[loss=0.269, simple_loss=0.3503, pruned_loss=0.09388, over 16639743.53 frames. ], batch size: 393, lr: 3.12e-02, grad_scale: 8.0 2024-10-08 02:47:19,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=57960.0, ans=0.2 2024-10-08 02:47:21,116 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.678e+02 6.887e+02 8.742e+02 1.042e+03 2.270e+03, threshold=1.748e+03, percent-clipped=2.0 2024-10-08 02:47:23,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=57960.0, ans=0.0 2024-10-08 02:47:31,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2024-10-08 02:47:47,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=58200.0, ans=0.0 2024-10-08 02:48:00,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.85 vs. limit=22.5 2024-10-08 02:48:02,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.02 vs. limit=10.0 2024-10-08 02:48:09,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=58320.0, ans=0.125 2024-10-08 02:48:35,898 INFO [train.py:1136] (0/2) Epoch 6, batch 800, loss[loss=0.2468, simple_loss=0.3372, pruned_loss=0.07824, over 87368.00 frames. ], tot_loss[loss=0.2684, simple_loss=0.3498, pruned_loss=0.09347, over 16732322.45 frames. ], batch size: 439, lr: 3.12e-02, grad_scale: 16.0 2024-10-08 02:48:41,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=58560.0, ans=0.125 2024-10-08 02:49:01,436 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-6.pt 2024-10-08 02:50:14,307 INFO [train.py:1136] (0/2) Epoch 7, batch 0, loss[loss=0.242, simple_loss=0.3279, pruned_loss=0.07801, over 86644.00 frames. ], tot_loss[loss=0.242, simple_loss=0.3279, pruned_loss=0.07801, over 86644.00 frames. ], batch size: 246, lr: 2.92e-02, grad_scale: 32.0 2024-10-08 02:50:14,309 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 02:50:25,367 INFO [train.py:1168] (0/2) Epoch 7, validation: loss=0.1952, simple_loss=0.312, pruned_loss=0.03925, over 1382211.00 frames. 2024-10-08 02:50:25,367 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 02:50:30,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=58752.0, ans=0.0 2024-10-08 02:50:30,968 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-10-08 02:50:43,305 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2024-10-08 02:51:17,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=58992.0, ans=0.0 2024-10-08 02:51:38,689 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.126e+02 6.056e+02 8.233e+02 1.114e+03 3.871e+03, threshold=1.647e+03, percent-clipped=9.0 2024-10-08 02:51:45,972 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-10-08 02:51:55,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=59232.0, ans=0.125 2024-10-08 02:52:01,986 INFO [train.py:1136] (0/2) Epoch 7, batch 50, loss[loss=0.2434, simple_loss=0.3336, pruned_loss=0.07665, over 87283.00 frames. ], tot_loss[loss=0.2581, simple_loss=0.3432, pruned_loss=0.08652, over 3896098.59 frames. ], batch size: 415, lr: 2.92e-02, grad_scale: 32.0 2024-10-08 02:52:02,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=59352.0, ans=0.0 2024-10-08 02:52:10,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=59352.0, ans=0.025 2024-10-08 02:52:22,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2024-10-08 02:53:38,597 INFO [train.py:1136] (0/2) Epoch 7, batch 100, loss[loss=0.3125, simple_loss=0.3845, pruned_loss=0.1202, over 81908.00 frames. ], tot_loss[loss=0.2616, simple_loss=0.345, pruned_loss=0.08908, over 6807714.44 frames. ], batch size: 1245, lr: 2.91e-02, grad_scale: 32.0 2024-10-08 02:53:58,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.33 vs. limit=15.0 2024-10-08 02:54:11,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.52 vs. limit=15.0 2024-10-08 02:54:23,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=60192.0, ans=0.0 2024-10-08 02:54:48,298 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.664e+02 6.597e+02 8.829e+02 1.274e+03 3.277e+03, threshold=1.766e+03, percent-clipped=14.0 2024-10-08 02:55:07,088 INFO [train.py:1136] (0/2) Epoch 7, batch 150, loss[loss=0.2504, simple_loss=0.3311, pruned_loss=0.08486, over 87254.00 frames. ], tot_loss[loss=0.2614, simple_loss=0.3442, pruned_loss=0.08933, over 9085686.01 frames. ], batch size: 296, lr: 2.91e-02, grad_scale: 16.0 2024-10-08 02:55:09,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60552.0, ans=0.1 2024-10-08 02:55:34,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=60672.0, ans=0.04949747468305833 2024-10-08 02:56:37,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=61032.0, ans=0.025 2024-10-08 02:56:42,251 INFO [train.py:1136] (0/2) Epoch 7, batch 200, loss[loss=0.3124, simple_loss=0.3853, pruned_loss=0.1198, over 81735.00 frames. ], tot_loss[loss=0.2599, simple_loss=0.3432, pruned_loss=0.08835, over 10889462.02 frames. ], batch size: 1245, lr: 2.90e-02, grad_scale: 16.0 2024-10-08 02:57:46,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=61512.0, ans=0.125 2024-10-08 02:57:51,604 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.574e+02 6.694e+02 8.375e+02 1.090e+03 2.390e+03, threshold=1.675e+03, percent-clipped=3.0 2024-10-08 02:58:16,111 INFO [train.py:1136] (0/2) Epoch 7, batch 250, loss[loss=0.2538, simple_loss=0.3397, pruned_loss=0.08402, over 87534.00 frames. ], tot_loss[loss=0.2592, simple_loss=0.3424, pruned_loss=0.08799, over 12274800.11 frames. ], batch size: 372, lr: 2.90e-02, grad_scale: 16.0 2024-10-08 02:58:42,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=61872.0, ans=0.125 2024-10-08 02:59:07,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=61992.0, ans=0.125 2024-10-08 02:59:14,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=62112.0, ans=0.0 2024-10-08 02:59:20,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=62112.0, ans=0.2 2024-10-08 02:59:47,206 INFO [train.py:1136] (0/2) Epoch 7, batch 300, loss[loss=0.2675, simple_loss=0.3513, pruned_loss=0.09185, over 86320.00 frames. ], tot_loss[loss=0.2594, simple_loss=0.3424, pruned_loss=0.08817, over 13334731.72 frames. ], batch size: 667, lr: 2.90e-02, grad_scale: 16.0 2024-10-08 02:59:57,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=62352.0, ans=0.0 2024-10-08 03:00:01,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.67 vs. limit=10.0 2024-10-08 03:00:18,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=62472.0, ans=0.125 2024-10-08 03:00:36,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=62592.0, ans=0.07 2024-10-08 03:00:50,383 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-10-08 03:01:02,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.18 vs. limit=10.0 2024-10-08 03:01:04,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.291e+02 6.768e+02 8.645e+02 1.100e+03 2.050e+03, threshold=1.729e+03, percent-clipped=3.0 2024-10-08 03:01:04,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=62832.0, ans=0.125 2024-10-08 03:01:22,625 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=62952.0, ans=0.125 2024-10-08 03:01:23,912 INFO [train.py:1136] (0/2) Epoch 7, batch 350, loss[loss=0.3267, simple_loss=0.3873, pruned_loss=0.133, over 69051.00 frames. ], tot_loss[loss=0.2598, simple_loss=0.3428, pruned_loss=0.08844, over 14145138.96 frames. ], batch size: 1960, lr: 2.89e-02, grad_scale: 16.0 2024-10-08 03:01:28,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=12.0 2024-10-08 03:02:42,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=63432.0, ans=0.0 2024-10-08 03:02:57,789 INFO [train.py:1136] (0/2) Epoch 7, batch 400, loss[loss=0.2473, simple_loss=0.3377, pruned_loss=0.0784, over 87415.00 frames. ], tot_loss[loss=0.2592, simple_loss=0.3425, pruned_loss=0.08795, over 14828141.99 frames. ], batch size: 490, lr: 2.89e-02, grad_scale: 32.0 2024-10-08 03:03:11,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=63552.0, ans=0.0 2024-10-08 03:03:17,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=63672.0, ans=0.125 2024-10-08 03:03:21,015 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=63672.0, ans=0.125 2024-10-08 03:03:28,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=63672.0, ans=0.125 2024-10-08 03:03:30,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=63672.0, ans=0.0 2024-10-08 03:03:46,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=63792.0, ans=0.0 2024-10-08 03:03:47,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=63792.0, ans=0.125 2024-10-08 03:03:49,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.87 vs. limit=15.0 2024-10-08 03:04:09,040 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=63912.0, ans=0.95 2024-10-08 03:04:13,520 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.646e+02 5.951e+02 7.101e+02 1.007e+03 2.958e+03, threshold=1.420e+03, percent-clipped=5.0 2024-10-08 03:04:15,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=64032.0, ans=0.2 2024-10-08 03:04:33,245 INFO [train.py:1136] (0/2) Epoch 7, batch 450, loss[loss=0.221, simple_loss=0.3091, pruned_loss=0.06646, over 86677.00 frames. ], tot_loss[loss=0.2575, simple_loss=0.3411, pruned_loss=0.08695, over 15351714.71 frames. ], batch size: 213, lr: 2.88e-02, grad_scale: 16.0 2024-10-08 03:04:39,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=64152.0, ans=0.0 2024-10-08 03:04:52,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=64272.0, ans=0.125 2024-10-08 03:05:01,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=64272.0, ans=0.0 2024-10-08 03:05:02,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=64272.0, ans=0.05 2024-10-08 03:05:04,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=64272.0, ans=0.0 2024-10-08 03:05:25,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=64392.0, ans=0.0 2024-10-08 03:05:27,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=64392.0, ans=0.125 2024-10-08 03:05:48,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=64632.0, ans=0.125 2024-10-08 03:05:55,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=64632.0, ans=0.125 2024-10-08 03:05:58,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=64632.0, ans=0.125 2024-10-08 03:06:05,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=64752.0, ans=15.0 2024-10-08 03:06:06,629 INFO [train.py:1136] (0/2) Epoch 7, batch 500, loss[loss=0.2669, simple_loss=0.3524, pruned_loss=0.09071, over 86999.00 frames. ], tot_loss[loss=0.255, simple_loss=0.3392, pruned_loss=0.08536, over 15795721.93 frames. ], batch size: 583, lr: 2.88e-02, grad_scale: 16.0 2024-10-08 03:06:51,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=64992.0, ans=0.125 2024-10-08 03:07:06,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=65112.0, ans=0.125 2024-10-08 03:07:24,373 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 5.076e+02 7.204e+02 8.619e+02 1.177e+03 2.888e+03, threshold=1.724e+03, percent-clipped=13.0 2024-10-08 03:07:30,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=65232.0, ans=0.125 2024-10-08 03:07:41,445 INFO [train.py:1136] (0/2) Epoch 7, batch 550, loss[loss=0.2353, simple_loss=0.3225, pruned_loss=0.07411, over 87050.00 frames. ], tot_loss[loss=0.2545, simple_loss=0.339, pruned_loss=0.08503, over 16119921.35 frames. ], batch size: 350, lr: 2.87e-02, grad_scale: 16.0 2024-10-08 03:07:41,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=65352.0, ans=0.2 2024-10-08 03:08:17,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65472.0, ans=0.1 2024-10-08 03:08:29,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=65592.0, ans=0.125 2024-10-08 03:08:30,285 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2024-10-08 03:08:33,283 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 03:08:57,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=65832.0, ans=0.95 2024-10-08 03:09:01,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-10-08 03:09:04,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=65832.0, ans=0.125 2024-10-08 03:09:17,155 INFO [train.py:1136] (0/2) Epoch 7, batch 600, loss[loss=0.2408, simple_loss=0.3241, pruned_loss=0.07872, over 87191.00 frames. ], tot_loss[loss=0.2549, simple_loss=0.3391, pruned_loss=0.08533, over 16342183.98 frames. ], batch size: 296, lr: 2.87e-02, grad_scale: 16.0 2024-10-08 03:09:29,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=65952.0, ans=0.125 2024-10-08 03:10:31,227 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.343e+02 5.831e+02 7.254e+02 1.001e+03 2.022e+03, threshold=1.451e+03, percent-clipped=2.0 2024-10-08 03:10:41,221 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.64 vs. limit=22.5 2024-10-08 03:10:49,024 INFO [train.py:1136] (0/2) Epoch 7, batch 650, loss[loss=0.2315, simple_loss=0.3134, pruned_loss=0.07485, over 86178.00 frames. ], tot_loss[loss=0.2557, simple_loss=0.3398, pruned_loss=0.08576, over 16518534.65 frames. ], batch size: 197, lr: 2.86e-02, grad_scale: 16.0 2024-10-08 03:10:51,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2024-10-08 03:11:37,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=66792.0, ans=0.125 2024-10-08 03:11:59,405 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.08 vs. limit=22.5 2024-10-08 03:12:10,668 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.07 vs. limit=15.0 2024-10-08 03:12:20,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=67152.0, ans=0.125 2024-10-08 03:12:21,976 INFO [train.py:1136] (0/2) Epoch 7, batch 700, loss[loss=0.259, simple_loss=0.348, pruned_loss=0.08503, over 85809.00 frames. ], tot_loss[loss=0.2562, simple_loss=0.3404, pruned_loss=0.08601, over 16634372.75 frames. ], batch size: 721, lr: 2.86e-02, grad_scale: 16.0 2024-10-08 03:12:30,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=67152.0, ans=0.125 2024-10-08 03:13:24,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=67512.0, ans=0.125 2024-10-08 03:13:28,505 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.366e+02 6.856e+02 8.614e+02 1.206e+03 2.644e+03, threshold=1.723e+03, percent-clipped=19.0 2024-10-08 03:13:38,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=67632.0, ans=0.125 2024-10-08 03:13:41,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=67632.0, ans=0.125 2024-10-08 03:13:44,249 INFO [train.py:1136] (0/2) Epoch 7, batch 750, loss[loss=0.2322, simple_loss=0.322, pruned_loss=0.0712, over 87377.00 frames. ], tot_loss[loss=0.2559, simple_loss=0.3404, pruned_loss=0.08566, over 16764074.75 frames. ], batch size: 372, lr: 2.86e-02, grad_scale: 16.0 2024-10-08 03:13:45,107 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.29 vs. limit=10.0 2024-10-08 03:14:26,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=67992.0, ans=0.0 2024-10-08 03:14:31,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67992.0, ans=0.1 2024-10-08 03:15:04,962 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=68232.0, ans=0.125 2024-10-08 03:15:07,812 INFO [train.py:1136] (0/2) Epoch 7, batch 800, loss[loss=0.2427, simple_loss=0.334, pruned_loss=0.07565, over 87327.00 frames. ], tot_loss[loss=0.2582, simple_loss=0.3423, pruned_loss=0.08706, over 16786487.02 frames. ], batch size: 464, lr: 2.85e-02, grad_scale: 32.0 2024-10-08 03:15:17,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2024-10-08 03:15:34,171 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-7.pt 2024-10-08 03:16:27,297 INFO [train.py:1136] (0/2) Epoch 8, batch 0, loss[loss=0.277, simple_loss=0.3626, pruned_loss=0.09567, over 87020.00 frames. ], tot_loss[loss=0.277, simple_loss=0.3626, pruned_loss=0.09567, over 87020.00 frames. ], batch size: 583, lr: 2.68e-02, grad_scale: 32.0 2024-10-08 03:16:27,299 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 03:16:38,372 INFO [train.py:1168] (0/2) Epoch 8, validation: loss=0.1943, simple_loss=0.3113, pruned_loss=0.03862, over 1382211.00 frames. 2024-10-08 03:16:38,373 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 03:17:27,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.498e+02 6.162e+02 8.115e+02 1.184e+03 2.143e+03, threshold=1.623e+03, percent-clipped=5.0 2024-10-08 03:17:58,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=69024.0, ans=0.125 2024-10-08 03:18:14,527 INFO [train.py:1136] (0/2) Epoch 8, batch 50, loss[loss=0.2212, simple_loss=0.312, pruned_loss=0.06517, over 86769.00 frames. ], tot_loss[loss=0.2523, simple_loss=0.3386, pruned_loss=0.08298, over 3882823.49 frames. ], batch size: 229, lr: 2.68e-02, grad_scale: 16.0 2024-10-08 03:18:25,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=69144.0, ans=0.0 2024-10-08 03:18:29,005 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-10-08 03:19:19,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=69504.0, ans=0.125 2024-10-08 03:19:47,496 INFO [train.py:1136] (0/2) Epoch 8, batch 100, loss[loss=0.2755, simple_loss=0.36, pruned_loss=0.09547, over 83262.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.3348, pruned_loss=0.08146, over 6833923.36 frames. ], batch size: 1078, lr: 2.68e-02, grad_scale: 8.0 2024-10-08 03:19:59,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-10-08 03:20:09,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=69864.0, ans=0.0 2024-10-08 03:20:21,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2024-10-08 03:20:31,394 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=12.0 2024-10-08 03:20:32,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=69984.0, ans=0.125 2024-10-08 03:20:35,867 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.586e+02 6.361e+02 8.256e+02 1.053e+03 1.999e+03, threshold=1.651e+03, percent-clipped=3.0 2024-10-08 03:20:49,699 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-10-08 03:20:53,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.74 vs. limit=22.5 2024-10-08 03:21:07,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=70224.0, ans=0.125 2024-10-08 03:21:19,810 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=15.0 2024-10-08 03:21:20,374 INFO [train.py:1136] (0/2) Epoch 8, batch 150, loss[loss=0.2706, simple_loss=0.3548, pruned_loss=0.09322, over 85482.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.3357, pruned_loss=0.08194, over 9100436.35 frames. ], batch size: 787, lr: 2.67e-02, grad_scale: 8.0 2024-10-08 03:22:06,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=70584.0, ans=0.0 2024-10-08 03:22:13,179 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 03:22:26,664 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=70704.0, ans=0.125 2024-10-08 03:22:47,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=70824.0, ans=0.125 2024-10-08 03:22:52,570 INFO [train.py:1136] (0/2) Epoch 8, batch 200, loss[loss=0.2444, simple_loss=0.3307, pruned_loss=0.07906, over 87296.00 frames. ], tot_loss[loss=0.2505, simple_loss=0.3361, pruned_loss=0.08248, over 10878590.59 frames. ], batch size: 330, lr: 2.67e-02, grad_scale: 8.0 2024-10-08 03:23:24,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=71064.0, ans=0.1 2024-10-08 03:23:24,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=71064.0, ans=0.125 2024-10-08 03:23:25,149 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-10-08 03:23:43,483 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.21 vs. limit=15.0 2024-10-08 03:23:43,809 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.235e+02 5.698e+02 7.332e+02 9.602e+02 1.733e+03, threshold=1.466e+03, percent-clipped=1.0 2024-10-08 03:24:05,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.89 vs. limit=22.5 2024-10-08 03:24:27,577 INFO [train.py:1136] (0/2) Epoch 8, batch 250, loss[loss=0.2463, simple_loss=0.3321, pruned_loss=0.08024, over 86858.00 frames. ], tot_loss[loss=0.2499, simple_loss=0.3356, pruned_loss=0.08213, over 12280457.89 frames. ], batch size: 547, lr: 2.66e-02, grad_scale: 8.0 2024-10-08 03:25:03,347 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 03:25:45,520 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=15.0 2024-10-08 03:26:01,263 INFO [train.py:1136] (0/2) Epoch 8, batch 300, loss[loss=0.221, simple_loss=0.3079, pruned_loss=0.06706, over 86557.00 frames. ], tot_loss[loss=0.2487, simple_loss=0.3348, pruned_loss=0.08133, over 13358633.10 frames. ], batch size: 213, lr: 2.66e-02, grad_scale: 8.0 2024-10-08 03:26:52,226 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.268e+02 5.609e+02 6.403e+02 8.139e+02 1.677e+03, threshold=1.281e+03, percent-clipped=1.0 2024-10-08 03:27:06,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=72504.0, ans=0.125 2024-10-08 03:27:16,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72504.0, ans=0.1 2024-10-08 03:27:36,481 INFO [train.py:1136] (0/2) Epoch 8, batch 350, loss[loss=0.2635, simple_loss=0.3508, pruned_loss=0.08807, over 86025.00 frames. ], tot_loss[loss=0.2486, simple_loss=0.3349, pruned_loss=0.08117, over 14202729.41 frames. ], batch size: 721, lr: 2.65e-02, grad_scale: 8.0 2024-10-08 03:27:38,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=72744.0, ans=0.0 2024-10-08 03:27:49,313 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.51 vs. limit=10.0 2024-10-08 03:28:24,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=72984.0, ans=0.125 2024-10-08 03:28:35,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=73104.0, ans=0.025 2024-10-08 03:28:46,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=73104.0, ans=0.0 2024-10-08 03:29:11,011 INFO [train.py:1136] (0/2) Epoch 8, batch 400, loss[loss=0.216, simple_loss=0.2982, pruned_loss=0.06692, over 85687.00 frames. ], tot_loss[loss=0.2471, simple_loss=0.3335, pruned_loss=0.0803, over 14871982.20 frames. ], batch size: 180, lr: 2.65e-02, grad_scale: 16.0 2024-10-08 03:29:18,161 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73344.0, ans=0.1 2024-10-08 03:29:30,222 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=73464.0, ans=0.05 2024-10-08 03:29:47,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73584.0, ans=0.1 2024-10-08 03:30:02,269 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.407e+02 5.492e+02 6.735e+02 8.332e+02 1.263e+03, threshold=1.347e+03, percent-clipped=0.0 2024-10-08 03:30:27,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=73824.0, ans=0.125 2024-10-08 03:30:27,468 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2024-10-08 03:30:39,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=22.5 2024-10-08 03:30:42,376 INFO [train.py:1136] (0/2) Epoch 8, batch 450, loss[loss=0.2324, simple_loss=0.322, pruned_loss=0.0714, over 87125.00 frames. ], tot_loss[loss=0.2468, simple_loss=0.3332, pruned_loss=0.08016, over 15384518.56 frames. ], batch size: 350, lr: 2.65e-02, grad_scale: 16.0 2024-10-08 03:30:45,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-10-08 03:31:11,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=74064.0, ans=0.125 2024-10-08 03:31:16,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=74064.0, ans=0.125 2024-10-08 03:31:18,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=74064.0, ans=0.0 2024-10-08 03:31:30,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=74184.0, ans=0.0 2024-10-08 03:32:04,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=74424.0, ans=0.2 2024-10-08 03:32:17,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=74544.0, ans=0.0 2024-10-08 03:32:18,391 INFO [train.py:1136] (0/2) Epoch 8, batch 500, loss[loss=0.2315, simple_loss=0.3191, pruned_loss=0.07194, over 86819.00 frames. ], tot_loss[loss=0.2485, simple_loss=0.3346, pruned_loss=0.08118, over 15742777.44 frames. ], batch size: 246, lr: 2.64e-02, grad_scale: 16.0 2024-10-08 03:32:19,621 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=15.0 2024-10-08 03:32:30,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=74544.0, ans=0.125 2024-10-08 03:33:03,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=74784.0, ans=0.125 2024-10-08 03:33:11,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.698e+02 5.967e+02 7.649e+02 9.422e+02 1.811e+03, threshold=1.530e+03, percent-clipped=4.0 2024-10-08 03:33:44,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=75024.0, ans=0.125 2024-10-08 03:33:54,019 INFO [train.py:1136] (0/2) Epoch 8, batch 550, loss[loss=0.2221, simple_loss=0.3111, pruned_loss=0.06657, over 86421.00 frames. ], tot_loss[loss=0.2483, simple_loss=0.3343, pruned_loss=0.08115, over 16033700.07 frames. ], batch size: 246, lr: 2.64e-02, grad_scale: 8.0 2024-10-08 03:33:54,522 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=75144.0, ans=0.125 2024-10-08 03:34:01,447 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=75144.0, ans=0.0 2024-10-08 03:34:28,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=75264.0, ans=0.2 2024-10-08 03:34:33,498 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 03:35:27,029 INFO [train.py:1136] (0/2) Epoch 8, batch 600, loss[loss=0.2319, simple_loss=0.3183, pruned_loss=0.07269, over 87197.00 frames. ], tot_loss[loss=0.2484, simple_loss=0.3345, pruned_loss=0.08116, over 16261277.31 frames. ], batch size: 330, lr: 2.63e-02, grad_scale: 8.0 2024-10-08 03:35:51,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=75864.0, ans=0.125 2024-10-08 03:36:03,172 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2024-10-08 03:36:17,823 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.577e+02 5.451e+02 6.305e+02 8.352e+02 1.284e+03, threshold=1.261e+03, percent-clipped=0.0 2024-10-08 03:36:26,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=76104.0, ans=0.0 2024-10-08 03:36:35,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=76104.0, ans=0.125 2024-10-08 03:36:59,259 INFO [train.py:1136] (0/2) Epoch 8, batch 650, loss[loss=0.2478, simple_loss=0.334, pruned_loss=0.08078, over 86984.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.3349, pruned_loss=0.08147, over 16443551.92 frames. ], batch size: 583, lr: 2.63e-02, grad_scale: 8.0 2024-10-08 03:37:45,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=76584.0, ans=0.125 2024-10-08 03:37:57,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=76704.0, ans=0.1 2024-10-08 03:38:08,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=76824.0, ans=0.0 2024-10-08 03:38:26,121 INFO [train.py:1136] (0/2) Epoch 8, batch 700, loss[loss=0.2585, simple_loss=0.3375, pruned_loss=0.08976, over 86802.00 frames. ], tot_loss[loss=0.2494, simple_loss=0.3352, pruned_loss=0.08177, over 16573044.30 frames. ], batch size: 547, lr: 2.62e-02, grad_scale: 8.0 2024-10-08 03:38:46,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=77064.0, ans=0.125 2024-10-08 03:38:54,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=77064.0, ans=0.0 2024-10-08 03:38:54,417 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.22 vs. limit=22.5 2024-10-08 03:39:00,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=77184.0, ans=0.0 2024-10-08 03:39:14,117 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.656e+02 6.306e+02 7.738e+02 9.639e+02 2.127e+03, threshold=1.548e+03, percent-clipped=7.0 2024-10-08 03:39:49,217 INFO [train.py:1136] (0/2) Epoch 8, batch 750, loss[loss=0.223, simple_loss=0.311, pruned_loss=0.06749, over 86621.00 frames. ], tot_loss[loss=0.2471, simple_loss=0.3334, pruned_loss=0.08041, over 16713751.16 frames. ], batch size: 246, lr: 2.62e-02, grad_scale: 8.0 2024-10-08 03:40:13,184 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=77664.0, ans=0.0 2024-10-08 03:40:21,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=77664.0, ans=0.2 2024-10-08 03:40:30,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=77784.0, ans=0.125 2024-10-08 03:40:39,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77904.0, ans=0.1 2024-10-08 03:41:16,708 INFO [train.py:1136] (0/2) Epoch 8, batch 800, loss[loss=0.2386, simple_loss=0.3264, pruned_loss=0.0754, over 87201.00 frames. ], tot_loss[loss=0.2524, simple_loss=0.337, pruned_loss=0.08384, over 16640343.92 frames. ], batch size: 350, lr: 2.61e-02, grad_scale: 16.0 2024-10-08 03:41:22,548 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2024-10-08 03:41:42,219 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-8.pt 2024-10-08 03:42:41,293 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 03:42:42,349 INFO [train.py:1136] (0/2) Epoch 9, batch 0, loss[loss=0.2646, simple_loss=0.3537, pruned_loss=0.08774, over 85744.00 frames. ], tot_loss[loss=0.2646, simple_loss=0.3537, pruned_loss=0.08774, over 85744.00 frames. ], batch size: 721, lr: 2.47e-02, grad_scale: 32.0 2024-10-08 03:42:42,351 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 03:42:55,521 INFO [train.py:1168] (0/2) Epoch 9, validation: loss=0.1885, simple_loss=0.3052, pruned_loss=0.03584, over 1382211.00 frames. 2024-10-08 03:42:55,521 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 03:42:57,642 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 03:43:03,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=78336.0, ans=0.125 2024-10-08 03:43:18,205 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.675e+02 5.858e+02 6.514e+02 8.229e+02 1.284e+03, threshold=1.303e+03, percent-clipped=0.0 2024-10-08 03:43:22,299 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=78456.0, ans=0.0 2024-10-08 03:43:47,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=78576.0, ans=0.0 2024-10-08 03:43:49,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=78576.0, ans=0.125 2024-10-08 03:43:50,841 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 03:44:08,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-10-08 03:44:11,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=78816.0, ans=0.0 2024-10-08 03:44:29,219 INFO [train.py:1136] (0/2) Epoch 9, batch 50, loss[loss=0.2302, simple_loss=0.3201, pruned_loss=0.07016, over 87436.00 frames. ], tot_loss[loss=0.248, simple_loss=0.3346, pruned_loss=0.08067, over 3826748.11 frames. ], batch size: 372, lr: 2.47e-02, grad_scale: 32.0 2024-10-08 03:44:44,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=78936.0, ans=0.2 2024-10-08 03:45:17,898 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=79176.0, ans=0.125 2024-10-08 03:45:17,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=79176.0, ans=0.125 2024-10-08 03:45:22,317 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-10-08 03:45:32,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=79296.0, ans=0.0 2024-10-08 03:45:55,796 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=79416.0, ans=0.2 2024-10-08 03:46:05,771 INFO [train.py:1136] (0/2) Epoch 9, batch 100, loss[loss=0.2678, simple_loss=0.3526, pruned_loss=0.09147, over 84404.00 frames. ], tot_loss[loss=0.2439, simple_loss=0.331, pruned_loss=0.0784, over 6758310.33 frames. ], batch size: 957, lr: 2.47e-02, grad_scale: 16.0 2024-10-08 03:46:11,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2024-10-08 03:46:29,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=79656.0, ans=0.2 2024-10-08 03:46:33,947 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.210e+02 5.984e+02 7.099e+02 9.075e+02 1.629e+03, threshold=1.420e+03, percent-clipped=3.0 2024-10-08 03:46:52,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=12.0 2024-10-08 03:46:56,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=79776.0, ans=0.125 2024-10-08 03:46:59,302 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-10-08 03:47:20,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2024-10-08 03:47:32,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=80016.0, ans=0.0 2024-10-08 03:47:35,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=80016.0, ans=0.015 2024-10-08 03:47:41,676 INFO [train.py:1136] (0/2) Epoch 9, batch 150, loss[loss=0.222, simple_loss=0.3135, pruned_loss=0.06523, over 87068.00 frames. ], tot_loss[loss=0.2449, simple_loss=0.3316, pruned_loss=0.07911, over 9051977.76 frames. ], batch size: 350, lr: 2.46e-02, grad_scale: 16.0 2024-10-08 03:49:16,148 INFO [train.py:1136] (0/2) Epoch 9, batch 200, loss[loss=0.2286, simple_loss=0.3136, pruned_loss=0.07176, over 87357.00 frames. ], tot_loss[loss=0.2429, simple_loss=0.3298, pruned_loss=0.07803, over 10844688.35 frames. ], batch size: 313, lr: 2.46e-02, grad_scale: 16.0 2024-10-08 03:49:28,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=80736.0, ans=0.125 2024-10-08 03:49:38,238 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.383e+02 5.498e+02 6.582e+02 8.066e+02 1.603e+03, threshold=1.316e+03, percent-clipped=1.0 2024-10-08 03:49:59,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=80976.0, ans=0.125 2024-10-08 03:50:16,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=81096.0, ans=6.0 2024-10-08 03:50:22,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=81096.0, ans=0.0 2024-10-08 03:50:37,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=22.5 2024-10-08 03:50:46,443 INFO [train.py:1136] (0/2) Epoch 9, batch 250, loss[loss=0.2403, simple_loss=0.3319, pruned_loss=0.07434, over 86482.00 frames. ], tot_loss[loss=0.2424, simple_loss=0.3298, pruned_loss=0.07751, over 12251037.77 frames. ], batch size: 668, lr: 2.45e-02, grad_scale: 16.0 2024-10-08 03:50:56,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81336.0, ans=0.1 2024-10-08 03:51:10,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81456.0, ans=0.1 2024-10-08 03:51:13,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=81456.0, ans=0.125 2024-10-08 03:51:18,983 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2024-10-08 03:52:03,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=81816.0, ans=0.025 2024-10-08 03:52:04,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=81816.0, ans=0.0 2024-10-08 03:52:18,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=81816.0, ans=0.0 2024-10-08 03:52:22,157 INFO [train.py:1136] (0/2) Epoch 9, batch 300, loss[loss=0.229, simple_loss=0.3265, pruned_loss=0.0658, over 87170.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.3296, pruned_loss=0.07743, over 13329769.12 frames. ], batch size: 517, lr: 2.45e-02, grad_scale: 16.0 2024-10-08 03:52:38,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=81936.0, ans=0.0 2024-10-08 03:52:46,138 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.427e+02 5.677e+02 6.713e+02 8.209e+02 1.433e+03, threshold=1.343e+03, percent-clipped=6.0 2024-10-08 03:52:54,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82056.0, ans=0.1 2024-10-08 03:53:08,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=82176.0, ans=0.125 2024-10-08 03:53:15,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82176.0, ans=0.1 2024-10-08 03:53:25,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=82296.0, ans=0.125 2024-10-08 03:53:40,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=82416.0, ans=0.125 2024-10-08 03:53:43,327 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2024-10-08 03:53:49,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=82416.0, ans=0.125 2024-10-08 03:53:58,286 INFO [train.py:1136] (0/2) Epoch 9, batch 350, loss[loss=0.2246, simple_loss=0.3187, pruned_loss=0.0653, over 87259.00 frames. ], tot_loss[loss=0.2415, simple_loss=0.3288, pruned_loss=0.0771, over 14159148.18 frames. ], batch size: 517, lr: 2.45e-02, grad_scale: 16.0 2024-10-08 03:54:27,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=82656.0, ans=0.125 2024-10-08 03:54:54,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=82896.0, ans=0.125 2024-10-08 03:55:34,195 INFO [train.py:1136] (0/2) Epoch 9, batch 400, loss[loss=0.2221, simple_loss=0.3163, pruned_loss=0.06392, over 87347.00 frames. ], tot_loss[loss=0.2404, simple_loss=0.3278, pruned_loss=0.07649, over 14832868.73 frames. ], batch size: 415, lr: 2.44e-02, grad_scale: 32.0 2024-10-08 03:55:52,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2024-10-08 03:55:55,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=83256.0, ans=0.125 2024-10-08 03:55:56,698 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.165e+02 5.646e+02 6.493e+02 7.992e+02 2.335e+03, threshold=1.299e+03, percent-clipped=7.0 2024-10-08 03:56:00,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=83256.0, ans=0.125 2024-10-08 03:56:43,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=83496.0, ans=0.125 2024-10-08 03:57:04,843 INFO [train.py:1136] (0/2) Epoch 9, batch 450, loss[loss=0.2232, simple_loss=0.3179, pruned_loss=0.06427, over 87281.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.328, pruned_loss=0.07672, over 15333079.17 frames. ], batch size: 439, lr: 2.44e-02, grad_scale: 8.0 2024-10-08 03:57:05,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=83736.0, ans=0.05 2024-10-08 03:57:27,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=83856.0, ans=0.025 2024-10-08 03:57:27,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=83856.0, ans=0.125 2024-10-08 03:57:36,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83856.0, ans=0.1 2024-10-08 03:58:00,379 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=84096.0, ans=0.125 2024-10-08 03:58:15,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84096.0, ans=0.1 2024-10-08 03:58:24,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.64 vs. limit=15.0 2024-10-08 03:58:29,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=84216.0, ans=0.2 2024-10-08 03:58:41,106 INFO [train.py:1136] (0/2) Epoch 9, batch 500, loss[loss=0.2095, simple_loss=0.2985, pruned_loss=0.06029, over 86346.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.3277, pruned_loss=0.07686, over 15703963.54 frames. ], batch size: 197, lr: 2.43e-02, grad_scale: 8.0 2024-10-08 03:58:43,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=84336.0, ans=0.125 2024-10-08 03:58:46,719 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=84336.0, ans=0.05 2024-10-08 03:58:49,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=84336.0, ans=0.125 2024-10-08 03:59:11,823 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.355e+02 5.370e+02 6.035e+02 8.022e+02 1.378e+03, threshold=1.207e+03, percent-clipped=1.0 2024-10-08 03:59:22,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=84576.0, ans=0.025 2024-10-08 03:59:46,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=84696.0, ans=0.07 2024-10-08 04:00:05,648 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.26 vs. limit=15.0 2024-10-08 04:00:08,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=84816.0, ans=0.0 2024-10-08 04:00:10,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=84816.0, ans=0.0 2024-10-08 04:00:16,756 INFO [train.py:1136] (0/2) Epoch 9, batch 550, loss[loss=0.2418, simple_loss=0.3308, pruned_loss=0.07643, over 86374.00 frames. ], tot_loss[loss=0.241, simple_loss=0.3281, pruned_loss=0.07695, over 15990455.52 frames. ], batch size: 667, lr: 2.43e-02, grad_scale: 8.0 2024-10-08 04:01:02,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=85176.0, ans=0.04949747468305833 2024-10-08 04:01:26,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=85296.0, ans=0.025 2024-10-08 04:01:53,200 INFO [train.py:1136] (0/2) Epoch 9, batch 600, loss[loss=0.2438, simple_loss=0.3308, pruned_loss=0.07835, over 86921.00 frames. ], tot_loss[loss=0.2411, simple_loss=0.3282, pruned_loss=0.07695, over 16235495.95 frames. ], batch size: 583, lr: 2.43e-02, grad_scale: 8.0 2024-10-08 04:02:08,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=85656.0, ans=0.0 2024-10-08 04:02:18,803 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.251e+02 5.363e+02 6.500e+02 8.152e+02 1.269e+03, threshold=1.300e+03, percent-clipped=1.0 2024-10-08 04:02:24,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.94 vs. limit=15.0 2024-10-08 04:02:31,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=85776.0, ans=0.2 2024-10-08 04:03:01,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=85896.0, ans=0.125 2024-10-08 04:03:06,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=86016.0, ans=0.2 2024-10-08 04:03:20,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=86016.0, ans=0.2 2024-10-08 04:03:26,175 INFO [train.py:1136] (0/2) Epoch 9, batch 650, loss[loss=0.212, simple_loss=0.3057, pruned_loss=0.05916, over 86927.00 frames. ], tot_loss[loss=0.2404, simple_loss=0.3281, pruned_loss=0.07641, over 16440501.14 frames. ], batch size: 350, lr: 2.42e-02, grad_scale: 8.0 2024-10-08 04:03:58,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=22.5 2024-10-08 04:04:25,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=86496.0, ans=0.025 2024-10-08 04:04:45,191 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.18 vs. limit=22.5 2024-10-08 04:04:45,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=86616.0, ans=0.125 2024-10-08 04:04:56,801 INFO [train.py:1136] (0/2) Epoch 9, batch 700, loss[loss=0.2279, simple_loss=0.3217, pruned_loss=0.06703, over 87194.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.3288, pruned_loss=0.07689, over 16584610.23 frames. ], batch size: 350, lr: 2.42e-02, grad_scale: 8.0 2024-10-08 04:05:20,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.265e+02 5.299e+02 6.229e+02 7.687e+02 1.424e+03, threshold=1.246e+03, percent-clipped=5.0 2024-10-08 04:05:33,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=86976.0, ans=0.0 2024-10-08 04:05:41,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=86976.0, ans=0.125 2024-10-08 04:05:47,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=87096.0, ans=0.025 2024-10-08 04:05:56,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=87096.0, ans=0.125 2024-10-08 04:06:20,089 INFO [train.py:1136] (0/2) Epoch 9, batch 750, loss[loss=0.2949, simple_loss=0.3612, pruned_loss=0.1144, over 69286.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.3301, pruned_loss=0.07774, over 16665933.67 frames. ], batch size: 1960, lr: 2.41e-02, grad_scale: 8.0 2024-10-08 04:06:40,280 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 04:06:45,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=87456.0, ans=0.2 2024-10-08 04:06:49,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=87456.0, ans=0.0 2024-10-08 04:06:57,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=87576.0, ans=0.02 2024-10-08 04:07:07,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=87576.0, ans=0.125 2024-10-08 04:07:32,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=87816.0, ans=0.2 2024-10-08 04:07:44,084 INFO [train.py:1136] (0/2) Epoch 9, batch 800, loss[loss=0.2629, simple_loss=0.3515, pruned_loss=0.08719, over 83237.00 frames. ], tot_loss[loss=0.2442, simple_loss=0.3312, pruned_loss=0.07854, over 16707206.80 frames. ], batch size: 1077, lr: 2.41e-02, grad_scale: 16.0 2024-10-08 04:07:46,510 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-10-08 04:07:49,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=87936.0, ans=0.125 2024-10-08 04:07:59,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=88056.0, ans=0.125 2024-10-08 04:08:07,880 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.433e+02 5.776e+02 6.801e+02 7.821e+02 1.497e+03, threshold=1.360e+03, percent-clipped=1.0 2024-10-08 04:08:09,147 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-9.pt 2024-10-08 04:09:06,879 INFO [train.py:1136] (0/2) Epoch 10, batch 0, loss[loss=0.2267, simple_loss=0.3245, pruned_loss=0.06443, over 87350.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.3245, pruned_loss=0.06443, over 87350.00 frames. ], batch size: 464, lr: 2.29e-02, grad_scale: 32.0 2024-10-08 04:09:06,880 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 04:09:17,839 INFO [train.py:1168] (0/2) Epoch 10, validation: loss=0.1839, simple_loss=0.3001, pruned_loss=0.03385, over 1382211.00 frames. 2024-10-08 04:09:17,839 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 04:09:26,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=88128.0, ans=0.0 2024-10-08 04:09:50,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=88248.0, ans=0.125 2024-10-08 04:09:52,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=88248.0, ans=0.2 2024-10-08 04:10:05,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=88368.0, ans=0.2 2024-10-08 04:10:08,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.20 vs. limit=15.0 2024-10-08 04:10:37,609 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.02 vs. limit=15.0 2024-10-08 04:10:40,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=88608.0, ans=0.125 2024-10-08 04:10:55,081 INFO [train.py:1136] (0/2) Epoch 10, batch 50, loss[loss=0.2216, simple_loss=0.3139, pruned_loss=0.06469, over 87270.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.3268, pruned_loss=0.07615, over 3839035.56 frames. ], batch size: 415, lr: 2.29e-02, grad_scale: 16.0 2024-10-08 04:11:13,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=88848.0, ans=0.0 2024-10-08 04:11:27,936 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=88848.0, ans=0.025 2024-10-08 04:11:32,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=88968.0, ans=0.125 2024-10-08 04:11:32,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=88968.0, ans=0.2 2024-10-08 04:11:55,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=89088.0, ans=0.0 2024-10-08 04:11:55,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2024-10-08 04:12:31,278 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.313e+02 5.201e+02 6.055e+02 6.745e+02 1.150e+03, threshold=1.211e+03, percent-clipped=0.0 2024-10-08 04:12:31,297 INFO [train.py:1136] (0/2) Epoch 10, batch 100, loss[loss=0.2215, simple_loss=0.3044, pruned_loss=0.0693, over 85738.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.3236, pruned_loss=0.0735, over 6792862.83 frames. ], batch size: 180, lr: 2.28e-02, grad_scale: 16.0 2024-10-08 04:13:40,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=89688.0, ans=0.02 2024-10-08 04:13:45,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=89808.0, ans=0.0 2024-10-08 04:14:05,091 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.36 vs. limit=22.5 2024-10-08 04:14:05,823 INFO [train.py:1136] (0/2) Epoch 10, batch 150, loss[loss=0.2145, simple_loss=0.3002, pruned_loss=0.06444, over 85281.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.3236, pruned_loss=0.0735, over 9064044.24 frames. ], batch size: 180, lr: 2.28e-02, grad_scale: 16.0 2024-10-08 04:14:11,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=89928.0, ans=0.125 2024-10-08 04:14:28,738 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=90048.0, ans=0.125 2024-10-08 04:14:28,846 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 04:14:30,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90048.0, ans=0.1 2024-10-08 04:15:36,633 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.507e+02 5.404e+02 6.201e+02 7.904e+02 1.350e+03, threshold=1.240e+03, percent-clipped=1.0 2024-10-08 04:15:36,652 INFO [train.py:1136] (0/2) Epoch 10, batch 200, loss[loss=0.2631, simple_loss=0.3511, pruned_loss=0.08757, over 83309.00 frames. ], tot_loss[loss=0.2371, simple_loss=0.3254, pruned_loss=0.07441, over 10852272.64 frames. ], batch size: 1077, lr: 2.27e-02, grad_scale: 16.0 2024-10-08 04:15:45,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=90528.0, ans=0.5 2024-10-08 04:15:59,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=90648.0, ans=0.125 2024-10-08 04:16:04,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=90648.0, ans=0.0 2024-10-08 04:16:23,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=90768.0, ans=0.125 2024-10-08 04:16:28,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=90768.0, ans=0.0 2024-10-08 04:16:46,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90888.0, ans=0.1 2024-10-08 04:17:01,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=91008.0, ans=0.125 2024-10-08 04:17:11,669 INFO [train.py:1136] (0/2) Epoch 10, batch 250, loss[loss=0.2231, simple_loss=0.3179, pruned_loss=0.06416, over 87337.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.3233, pruned_loss=0.07308, over 12254381.73 frames. ], batch size: 372, lr: 2.27e-02, grad_scale: 16.0 2024-10-08 04:17:20,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=91128.0, ans=0.95 2024-10-08 04:17:24,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91128.0, ans=0.1 2024-10-08 04:17:27,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=91248.0, ans=0.125 2024-10-08 04:17:48,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=91248.0, ans=0.125 2024-10-08 04:17:53,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=91368.0, ans=0.125 2024-10-08 04:18:01,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=91368.0, ans=0.0 2024-10-08 04:18:10,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=91488.0, ans=0.0 2024-10-08 04:18:14,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-10-08 04:18:17,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=91488.0, ans=0.125 2024-10-08 04:18:47,915 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.130e+02 5.077e+02 5.821e+02 6.669e+02 1.555e+03, threshold=1.164e+03, percent-clipped=1.0 2024-10-08 04:18:47,934 INFO [train.py:1136] (0/2) Epoch 10, batch 300, loss[loss=0.2331, simple_loss=0.3266, pruned_loss=0.06981, over 86801.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.324, pruned_loss=0.07317, over 13320741.39 frames. ], batch size: 547, lr: 2.27e-02, grad_scale: 16.0 2024-10-08 04:18:48,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=91728.0, ans=0.0 2024-10-08 04:19:00,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=22.5 2024-10-08 04:19:10,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=91848.0, ans=0.0 2024-10-08 04:19:22,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=91968.0, ans=0.125 2024-10-08 04:19:32,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.80 vs. limit=10.0 2024-10-08 04:19:50,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=92088.0, ans=0.09899494936611666 2024-10-08 04:20:00,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92208.0, ans=0.1 2024-10-08 04:20:20,047 INFO [train.py:1136] (0/2) Epoch 10, batch 350, loss[loss=0.2996, simple_loss=0.3699, pruned_loss=0.1146, over 78617.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.3234, pruned_loss=0.07262, over 14190689.95 frames. ], batch size: 1493, lr: 2.26e-02, grad_scale: 16.0 2024-10-08 04:20:45,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=92448.0, ans=0.2 2024-10-08 04:21:17,021 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.74 vs. limit=15.0 2024-10-08 04:21:39,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=22.5 2024-10-08 04:21:56,106 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.923e+02 5.207e+02 6.158e+02 7.975e+02 1.342e+03, threshold=1.232e+03, percent-clipped=3.0 2024-10-08 04:21:56,125 INFO [train.py:1136] (0/2) Epoch 10, batch 400, loss[loss=0.2825, simple_loss=0.3621, pruned_loss=0.1014, over 81959.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.3242, pruned_loss=0.07301, over 14838810.19 frames. ], batch size: 1245, lr: 2.26e-02, grad_scale: 32.0 2024-10-08 04:22:19,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=93048.0, ans=0.125 2024-10-08 04:22:29,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=93048.0, ans=0.2 2024-10-08 04:22:38,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=93168.0, ans=0.0 2024-10-08 04:23:28,604 INFO [train.py:1136] (0/2) Epoch 10, batch 450, loss[loss=0.2245, simple_loss=0.3195, pruned_loss=0.0648, over 87262.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.3236, pruned_loss=0.07253, over 15364182.80 frames. ], batch size: 439, lr: 2.26e-02, grad_scale: 32.0 2024-10-08 04:24:28,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=93888.0, ans=0.125 2024-10-08 04:25:04,488 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.963e+02 4.951e+02 5.579e+02 6.565e+02 1.027e+03, threshold=1.116e+03, percent-clipped=0.0 2024-10-08 04:25:04,507 INFO [train.py:1136] (0/2) Epoch 10, batch 500, loss[loss=0.2247, simple_loss=0.3109, pruned_loss=0.06931, over 87234.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.323, pruned_loss=0.07227, over 15757772.46 frames. ], batch size: 296, lr: 2.25e-02, grad_scale: 32.0 2024-10-08 04:25:40,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=94368.0, ans=0.025 2024-10-08 04:26:01,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=94488.0, ans=0.07 2024-10-08 04:26:23,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=94608.0, ans=0.0 2024-10-08 04:26:24,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=94608.0, ans=0.0 2024-10-08 04:26:33,723 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2024-10-08 04:26:39,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=94728.0, ans=10.0 2024-10-08 04:26:40,528 INFO [train.py:1136] (0/2) Epoch 10, batch 550, loss[loss=0.2131, simple_loss=0.3109, pruned_loss=0.05763, over 87181.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.3235, pruned_loss=0.07267, over 16040903.13 frames. ], batch size: 517, lr: 2.25e-02, grad_scale: 32.0 2024-10-08 04:26:44,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=94728.0, ans=0.0 2024-10-08 04:26:50,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.58 vs. limit=10.0 2024-10-08 04:27:03,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=94848.0, ans=0.0 2024-10-08 04:27:11,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=94848.0, ans=0.0 2024-10-08 04:27:13,813 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=94848.0, ans=0.125 2024-10-08 04:28:11,666 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 04:28:12,855 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.968e+02 5.193e+02 6.106e+02 7.133e+02 1.489e+03, threshold=1.221e+03, percent-clipped=6.0 2024-10-08 04:28:12,874 INFO [train.py:1136] (0/2) Epoch 10, batch 600, loss[loss=0.2156, simple_loss=0.3105, pruned_loss=0.06033, over 87411.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.3234, pruned_loss=0.07241, over 16287328.68 frames. ], batch size: 393, lr: 2.24e-02, grad_scale: 32.0 2024-10-08 04:28:48,932 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2024-10-08 04:29:39,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=95808.0, ans=0.0 2024-10-08 04:29:50,767 INFO [train.py:1136] (0/2) Epoch 10, batch 650, loss[loss=0.2223, simple_loss=0.3016, pruned_loss=0.07147, over 85517.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.3228, pruned_loss=0.07221, over 16476230.44 frames. ], batch size: 180, lr: 2.24e-02, grad_scale: 16.0 2024-10-08 04:29:54,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=95928.0, ans=0.125 2024-10-08 04:29:59,577 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-8000.pt 2024-10-08 04:31:14,437 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-10-08 04:31:29,231 INFO [train.py:1136] (0/2) Epoch 10, batch 700, loss[loss=0.2298, simple_loss=0.323, pruned_loss=0.06831, over 86942.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.3227, pruned_loss=0.07207, over 16636727.12 frames. ], batch size: 547, lr: 2.24e-02, grad_scale: 16.0 2024-10-08 04:31:30,894 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.330e+02 5.552e+02 6.542e+02 7.784e+02 1.535e+03, threshold=1.308e+03, percent-clipped=2.0 2024-10-08 04:31:41,014 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2024-10-08 04:31:55,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=96648.0, ans=0.0 2024-10-08 04:32:16,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.41 vs. limit=10.0 2024-10-08 04:32:17,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=96888.0, ans=0.125 2024-10-08 04:32:31,758 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=96888.0, ans=0.2 2024-10-08 04:32:34,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=97008.0, ans=0.0 2024-10-08 04:32:36,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=97008.0, ans=0.125 2024-10-08 04:32:39,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=97008.0, ans=0.0 2024-10-08 04:32:43,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=97008.0, ans=0.0 2024-10-08 04:32:52,158 INFO [train.py:1136] (0/2) Epoch 10, batch 750, loss[loss=0.2575, simple_loss=0.3446, pruned_loss=0.08521, over 83382.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.3237, pruned_loss=0.07284, over 16693826.18 frames. ], batch size: 1077, lr: 2.23e-02, grad_scale: 16.0 2024-10-08 04:33:01,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=97128.0, ans=0.125 2024-10-08 04:33:24,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=97368.0, ans=0.0 2024-10-08 04:33:28,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=97368.0, ans=0.04949747468305833 2024-10-08 04:33:46,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=97488.0, ans=0.025 2024-10-08 04:34:00,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-10-08 04:34:15,051 INFO [train.py:1136] (0/2) Epoch 10, batch 800, loss[loss=0.221, simple_loss=0.3126, pruned_loss=0.06466, over 87452.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.3242, pruned_loss=0.07319, over 16732570.81 frames. ], batch size: 393, lr: 2.23e-02, grad_scale: 32.0 2024-10-08 04:34:16,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.214e+02 5.157e+02 5.897e+02 7.048e+02 1.394e+03, threshold=1.179e+03, percent-clipped=3.0 2024-10-08 04:34:41,292 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-10.pt 2024-10-08 04:35:23,649 INFO [train.py:1136] (0/2) Epoch 11, batch 0, loss[loss=0.2228, simple_loss=0.3156, pruned_loss=0.06504, over 87469.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.3156, pruned_loss=0.06504, over 87469.00 frames. ], batch size: 372, lr: 2.13e-02, grad_scale: 32.0 2024-10-08 04:35:23,650 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 04:35:34,887 INFO [train.py:1168] (0/2) Epoch 11, validation: loss=0.1827, simple_loss=0.2991, pruned_loss=0.03311, over 1382211.00 frames. 2024-10-08 04:35:34,887 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 04:35:42,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=97920.0, ans=0.125 2024-10-08 04:35:57,555 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.05 vs. limit=15.0 2024-10-08 04:36:12,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=98160.0, ans=0.125 2024-10-08 04:36:26,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=98160.0, ans=10.0 2024-10-08 04:36:27,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=98280.0, ans=10.0 2024-10-08 04:36:36,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=98280.0, ans=0.125 2024-10-08 04:36:37,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=98280.0, ans=0.125 2024-10-08 04:37:03,505 INFO [train.py:1136] (0/2) Epoch 11, batch 50, loss[loss=0.2171, simple_loss=0.3117, pruned_loss=0.06125, over 87247.00 frames. ], tot_loss[loss=0.232, simple_loss=0.322, pruned_loss=0.07096, over 3871649.67 frames. ], batch size: 393, lr: 2.12e-02, grad_scale: 16.0 2024-10-08 04:37:14,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=98520.0, ans=0.125 2024-10-08 04:37:17,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=98520.0, ans=0.1 2024-10-08 04:37:19,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98520.0, ans=0.1 2024-10-08 04:37:24,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=98640.0, ans=0.0 2024-10-08 04:37:46,411 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=22.5 2024-10-08 04:38:02,886 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=98760.0, ans=0.035 2024-10-08 04:38:16,310 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.167e+02 5.147e+02 5.905e+02 7.381e+02 1.305e+03, threshold=1.181e+03, percent-clipped=1.0 2024-10-08 04:38:20,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=98880.0, ans=0.2 2024-10-08 04:38:42,663 INFO [train.py:1136] (0/2) Epoch 11, batch 100, loss[loss=0.2057, simple_loss=0.2941, pruned_loss=0.05868, over 86589.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.3203, pruned_loss=0.06971, over 6816071.05 frames. ], batch size: 213, lr: 2.12e-02, grad_scale: 16.0 2024-10-08 04:39:09,718 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.672e-03 2024-10-08 04:39:13,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2024-10-08 04:39:45,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=99480.0, ans=0.0 2024-10-08 04:39:59,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=99600.0, ans=0.125 2024-10-08 04:40:12,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=99600.0, ans=0.125 2024-10-08 04:40:15,583 INFO [train.py:1136] (0/2) Epoch 11, batch 150, loss[loss=0.2616, simple_loss=0.349, pruned_loss=0.08709, over 83611.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.3204, pruned_loss=0.06955, over 9084196.55 frames. ], batch size: 1079, lr: 2.12e-02, grad_scale: 16.0 2024-10-08 04:41:06,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=99960.0, ans=0.015 2024-10-08 04:41:06,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=99960.0, ans=0.1 2024-10-08 04:41:12,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=100080.0, ans=0.0 2024-10-08 04:41:17,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=100080.0, ans=0.1 2024-10-08 04:41:24,395 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.175e+02 4.812e+02 5.216e+02 5.790e+02 8.023e+02, threshold=1.043e+03, percent-clipped=0.0 2024-10-08 04:41:26,711 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=100080.0, ans=0.02 2024-10-08 04:41:51,542 INFO [train.py:1136] (0/2) Epoch 11, batch 200, loss[loss=0.2222, simple_loss=0.3109, pruned_loss=0.06673, over 87298.00 frames. ], tot_loss[loss=0.23, simple_loss=0.32, pruned_loss=0.06994, over 10850238.20 frames. ], batch size: 313, lr: 2.11e-02, grad_scale: 16.0 2024-10-08 04:42:36,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=100560.0, ans=0.125 2024-10-08 04:42:39,074 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2024-10-08 04:42:48,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=100680.0, ans=0.125 2024-10-08 04:43:00,724 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2024-10-08 04:43:26,629 INFO [train.py:1136] (0/2) Epoch 11, batch 250, loss[loss=0.2358, simple_loss=0.3273, pruned_loss=0.07211, over 86345.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.3205, pruned_loss=0.06988, over 12239119.44 frames. ], batch size: 667, lr: 2.11e-02, grad_scale: 16.0 2024-10-08 04:43:49,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-10-08 04:43:59,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=101040.0, ans=0.09899494936611666 2024-10-08 04:44:09,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=101160.0, ans=0.125 2024-10-08 04:44:26,464 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2024-10-08 04:44:33,820 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.899e+02 5.070e+02 5.670e+02 6.505e+02 9.276e+02, threshold=1.134e+03, percent-clipped=0.0 2024-10-08 04:44:44,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=101400.0, ans=0.0 2024-10-08 04:44:53,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=101400.0, ans=0.0 2024-10-08 04:45:00,249 INFO [train.py:1136] (0/2) Epoch 11, batch 300, loss[loss=0.2038, simple_loss=0.296, pruned_loss=0.05583, over 86659.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.3184, pruned_loss=0.06869, over 13357830.86 frames. ], batch size: 246, lr: 2.11e-02, grad_scale: 16.0 2024-10-08 04:45:08,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-10-08 04:45:20,278 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=101640.0, ans=0.125 2024-10-08 04:45:25,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2024-10-08 04:45:45,646 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.79 vs. limit=15.0 2024-10-08 04:46:35,944 INFO [train.py:1136] (0/2) Epoch 11, batch 350, loss[loss=0.2129, simple_loss=0.3103, pruned_loss=0.05772, over 87488.00 frames. ], tot_loss[loss=0.229, simple_loss=0.3195, pruned_loss=0.0692, over 14185351.40 frames. ], batch size: 490, lr: 2.10e-02, grad_scale: 16.0 2024-10-08 04:46:43,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=102120.0, ans=0.125 2024-10-08 04:47:08,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=102240.0, ans=0.0 2024-10-08 04:47:46,633 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.997e+02 5.193e+02 6.155e+02 7.464e+02 1.233e+03, threshold=1.231e+03, percent-clipped=3.0 2024-10-08 04:47:50,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=102480.0, ans=0.125 2024-10-08 04:48:00,111 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=102600.0, ans=0.125 2024-10-08 04:48:14,297 INFO [train.py:1136] (0/2) Epoch 11, batch 400, loss[loss=0.2087, simple_loss=0.3065, pruned_loss=0.05545, over 87306.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.3215, pruned_loss=0.07103, over 14758826.97 frames. ], batch size: 393, lr: 2.10e-02, grad_scale: 32.0 2024-10-08 04:48:14,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=102720.0, ans=0.125 2024-10-08 04:48:16,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-10-08 04:48:54,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=102960.0, ans=0.0 2024-10-08 04:49:01,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=102960.0, ans=0.025 2024-10-08 04:49:47,477 INFO [train.py:1136] (0/2) Epoch 11, batch 450, loss[loss=0.2112, simple_loss=0.3085, pruned_loss=0.05692, over 87251.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.3205, pruned_loss=0.07036, over 15287145.94 frames. ], batch size: 439, lr: 2.10e-02, grad_scale: 16.0 2024-10-08 04:49:51,523 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 04:50:27,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=103560.0, ans=0.125 2024-10-08 04:50:58,567 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.007e+02 4.671e+02 5.343e+02 6.298e+02 1.418e+03, threshold=1.069e+03, percent-clipped=1.0 2024-10-08 04:51:05,570 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.49 vs. limit=15.0 2024-10-08 04:51:25,236 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 04:51:28,250 INFO [train.py:1136] (0/2) Epoch 11, batch 500, loss[loss=0.2184, simple_loss=0.3152, pruned_loss=0.06086, over 87207.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.3208, pruned_loss=0.07032, over 15679351.95 frames. ], batch size: 517, lr: 2.09e-02, grad_scale: 16.0 2024-10-08 04:51:42,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=103920.0, ans=0.0 2024-10-08 04:52:13,338 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-10-08 04:52:42,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.95 vs. limit=22.5 2024-10-08 04:52:46,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=104400.0, ans=0.05 2024-10-08 04:53:02,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=104520.0, ans=0.125 2024-10-08 04:53:03,882 INFO [train.py:1136] (0/2) Epoch 11, batch 550, loss[loss=0.2089, simple_loss=0.3033, pruned_loss=0.05722, over 87324.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.3209, pruned_loss=0.07022, over 15983771.76 frames. ], batch size: 393, lr: 2.09e-02, grad_scale: 16.0 2024-10-08 04:53:10,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=104520.0, ans=0.125 2024-10-08 04:53:19,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=104520.0, ans=0.025 2024-10-08 04:53:19,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=104520.0, ans=0.2 2024-10-08 04:53:21,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-10-08 04:53:21,842 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.69 vs. limit=15.0 2024-10-08 04:53:33,975 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 04:54:10,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104880.0, ans=0.1 2024-10-08 04:54:13,810 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.217e+02 5.120e+02 5.936e+02 6.736e+02 1.051e+03, threshold=1.187e+03, percent-clipped=0.0 2024-10-08 04:54:50,967 INFO [train.py:1136] (0/2) Epoch 11, batch 600, loss[loss=0.2379, simple_loss=0.3325, pruned_loss=0.07166, over 85662.00 frames. ], tot_loss[loss=0.23, simple_loss=0.3204, pruned_loss=0.06985, over 16233353.49 frames. ], batch size: 787, lr: 2.09e-02, grad_scale: 16.0 2024-10-08 04:55:16,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=105240.0, ans=0.125 2024-10-08 04:55:25,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=105240.0, ans=22.5 2024-10-08 04:56:03,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-10-08 04:56:24,890 INFO [train.py:1136] (0/2) Epoch 11, batch 650, loss[loss=0.2122, simple_loss=0.3079, pruned_loss=0.05828, over 87202.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.3203, pruned_loss=0.06959, over 16436622.91 frames. ], batch size: 517, lr: 2.08e-02, grad_scale: 16.0 2024-10-08 04:57:04,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=105960.0, ans=0.025 2024-10-08 04:57:10,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=105960.0, ans=0.0 2024-10-08 04:57:23,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=106080.0, ans=0.025 2024-10-08 04:57:29,441 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.140e+02 5.014e+02 5.837e+02 7.061e+02 1.168e+03, threshold=1.167e+03, percent-clipped=0.0 2024-10-08 04:57:38,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2024-10-08 04:57:52,304 INFO [train.py:1136] (0/2) Epoch 11, batch 700, loss[loss=0.2394, simple_loss=0.3284, pruned_loss=0.07523, over 86476.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.3208, pruned_loss=0.07016, over 16554003.75 frames. ], batch size: 620, lr: 2.08e-02, grad_scale: 16.0 2024-10-08 04:58:10,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=106440.0, ans=0.0 2024-10-08 04:58:29,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=106560.0, ans=0.125 2024-10-08 04:58:40,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-10-08 04:58:41,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=106680.0, ans=0.05 2024-10-08 04:58:52,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=106680.0, ans=0.0 2024-10-08 04:59:15,858 INFO [train.py:1136] (0/2) Epoch 11, batch 750, loss[loss=0.2141, simple_loss=0.3125, pruned_loss=0.05783, over 87419.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.3214, pruned_loss=0.07093, over 16645060.19 frames. ], batch size: 439, lr: 2.08e-02, grad_scale: 16.0 2024-10-08 04:59:17,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=106920.0, ans=0.025 2024-10-08 04:59:33,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=107040.0, ans=0.125 2024-10-08 04:59:36,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=107040.0, ans=0.0 2024-10-08 04:59:59,855 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-10-08 05:00:16,526 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.898e+02 5.138e+02 5.887e+02 7.050e+02 1.118e+03, threshold=1.177e+03, percent-clipped=0.0 2024-10-08 05:00:20,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107400.0, ans=0.1 2024-10-08 05:00:38,306 INFO [train.py:1136] (0/2) Epoch 11, batch 800, loss[loss=0.2007, simple_loss=0.2932, pruned_loss=0.05409, over 86528.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.3215, pruned_loss=0.07102, over 16689796.88 frames. ], batch size: 213, lr: 2.07e-02, grad_scale: 32.0 2024-10-08 05:01:03,226 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-11.pt 2024-10-08 05:01:56,664 INFO [train.py:1136] (0/2) Epoch 12, batch 0, loss[loss=0.2187, simple_loss=0.3145, pruned_loss=0.06142, over 87407.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.3145, pruned_loss=0.06142, over 87407.00 frames. ], batch size: 464, lr: 1.98e-02, grad_scale: 32.0 2024-10-08 05:01:56,666 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 05:02:05,675 INFO [zipformer.py:1883] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7320, 5.1674, 5.5947, 5.0970], device='cuda:0') 2024-10-08 05:02:08,304 INFO [train.py:1168] (0/2) Epoch 12, validation: loss=0.1786, simple_loss=0.2938, pruned_loss=0.03163, over 1382211.00 frames. 2024-10-08 05:02:08,305 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 05:02:08,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107712.0, ans=0.1 2024-10-08 05:02:29,026 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 05:02:30,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=107832.0, ans=0.0 2024-10-08 05:02:46,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-10-08 05:02:50,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=107952.0, ans=0.0 2024-10-08 05:03:14,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=108072.0, ans=0.125 2024-10-08 05:03:16,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=108072.0, ans=0.125 2024-10-08 05:03:39,606 INFO [train.py:1136] (0/2) Epoch 12, batch 50, loss[loss=0.244, simple_loss=0.3336, pruned_loss=0.07722, over 85463.00 frames. ], tot_loss[loss=0.226, simple_loss=0.3167, pruned_loss=0.06767, over 3858071.21 frames. ], batch size: 787, lr: 1.98e-02, grad_scale: 32.0 2024-10-08 05:03:53,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=108312.0, ans=0.0 2024-10-08 05:03:59,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=108432.0, ans=0.125 2024-10-08 05:04:04,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=108432.0, ans=0.0 2024-10-08 05:04:20,506 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.999e+02 4.970e+02 5.563e+02 6.874e+02 1.103e+03, threshold=1.113e+03, percent-clipped=0.0 2024-10-08 05:05:17,360 INFO [train.py:1136] (0/2) Epoch 12, batch 100, loss[loss=0.2086, simple_loss=0.3044, pruned_loss=0.0564, over 87137.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.3177, pruned_loss=0.0685, over 6746643.22 frames. ], batch size: 330, lr: 1.98e-02, grad_scale: 32.0 2024-10-08 05:05:36,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-10-08 05:05:44,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=109032.0, ans=0.0 2024-10-08 05:06:20,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.82 vs. limit=15.0 2024-10-08 05:06:53,064 INFO [train.py:1136] (0/2) Epoch 12, batch 150, loss[loss=0.2149, simple_loss=0.3041, pruned_loss=0.06288, over 87320.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.3177, pruned_loss=0.068, over 9064839.72 frames. ], batch size: 313, lr: 1.97e-02, grad_scale: 32.0 2024-10-08 05:07:26,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=109632.0, ans=0.95 2024-10-08 05:07:31,334 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.898e+02 4.783e+02 5.502e+02 6.212e+02 9.689e+02, threshold=1.100e+03, percent-clipped=0.0 2024-10-08 05:07:40,276 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=109752.0, ans=0.1 2024-10-08 05:07:48,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=109872.0, ans=0.025 2024-10-08 05:08:22,615 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=15.0 2024-10-08 05:08:23,357 INFO [train.py:1136] (0/2) Epoch 12, batch 200, loss[loss=0.2527, simple_loss=0.343, pruned_loss=0.08123, over 83531.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.3167, pruned_loss=0.0671, over 10865321.91 frames. ], batch size: 1079, lr: 1.97e-02, grad_scale: 32.0 2024-10-08 05:08:33,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=110112.0, ans=0.1 2024-10-08 05:08:44,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=110232.0, ans=0.125 2024-10-08 05:09:16,406 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.76 vs. limit=15.0 2024-10-08 05:09:43,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110592.0, ans=0.1 2024-10-08 05:09:47,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=110592.0, ans=0.125 2024-10-08 05:09:49,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=110592.0, ans=0.125 2024-10-08 05:10:02,826 INFO [train.py:1136] (0/2) Epoch 12, batch 250, loss[loss=0.2171, simple_loss=0.3118, pruned_loss=0.06122, over 87368.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.3165, pruned_loss=0.06664, over 12247646.91 frames. ], batch size: 372, lr: 1.97e-02, grad_scale: 16.0 2024-10-08 05:10:36,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=110832.0, ans=0.125 2024-10-08 05:10:38,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=110952.0, ans=0.0 2024-10-08 05:10:43,254 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.912e+02 4.785e+02 5.591e+02 6.404e+02 8.798e+02, threshold=1.118e+03, percent-clipped=0.0 2024-10-08 05:10:48,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=110952.0, ans=0.125 2024-10-08 05:10:50,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=110952.0, ans=0.125 2024-10-08 05:11:11,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=111072.0, ans=0.0 2024-10-08 05:11:19,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=111192.0, ans=0.0 2024-10-08 05:11:36,244 INFO [train.py:1136] (0/2) Epoch 12, batch 300, loss[loss=0.2058, simple_loss=0.3032, pruned_loss=0.05415, over 87301.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.3162, pruned_loss=0.06645, over 13332701.77 frames. ], batch size: 372, lr: 1.96e-02, grad_scale: 16.0 2024-10-08 05:12:19,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=111552.0, ans=0.0 2024-10-08 05:12:29,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=111552.0, ans=0.125 2024-10-08 05:12:36,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=111672.0, ans=0.0 2024-10-08 05:12:42,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=111672.0, ans=0.125 2024-10-08 05:13:11,765 INFO [train.py:1136] (0/2) Epoch 12, batch 350, loss[loss=0.2021, simple_loss=0.2906, pruned_loss=0.05678, over 86677.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.3159, pruned_loss=0.0663, over 14195845.26 frames. ], batch size: 246, lr: 1.96e-02, grad_scale: 16.0 2024-10-08 05:13:44,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.58 vs. limit=15.0 2024-10-08 05:13:46,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2024-10-08 05:13:51,694 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.079e+02 4.698e+02 5.281e+02 6.518e+02 1.084e+03, threshold=1.056e+03, percent-clipped=0.0 2024-10-08 05:13:55,604 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=112152.0, ans=0.2 2024-10-08 05:13:57,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.94 vs. limit=10.0 2024-10-08 05:14:12,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112272.0, ans=0.1 2024-10-08 05:14:13,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=112272.0, ans=0.125 2024-10-08 05:14:45,264 INFO [train.py:1136] (0/2) Epoch 12, batch 400, loss[loss=0.225, simple_loss=0.3183, pruned_loss=0.06585, over 86952.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.3157, pruned_loss=0.06603, over 14872440.60 frames. ], batch size: 583, lr: 1.96e-02, grad_scale: 32.0 2024-10-08 05:15:01,146 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2024-10-08 05:16:21,651 INFO [train.py:1136] (0/2) Epoch 12, batch 450, loss[loss=0.2505, simple_loss=0.3419, pruned_loss=0.07959, over 83214.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.3156, pruned_loss=0.06582, over 15373936.04 frames. ], batch size: 1077, lr: 1.96e-02, grad_scale: 32.0 2024-10-08 05:16:31,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113112.0, ans=0.1 2024-10-08 05:16:59,335 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 05:17:04,418 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.944e+02 4.783e+02 5.486e+02 6.543e+02 9.380e+02, threshold=1.097e+03, percent-clipped=0.0 2024-10-08 05:17:44,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=113592.0, ans=0.125 2024-10-08 05:17:45,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=113592.0, ans=0.125 2024-10-08 05:17:56,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=113712.0, ans=0.1 2024-10-08 05:17:57,316 INFO [train.py:1136] (0/2) Epoch 12, batch 500, loss[loss=0.2358, simple_loss=0.3283, pruned_loss=0.07171, over 84785.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.3165, pruned_loss=0.06654, over 15738605.30 frames. ], batch size: 958, lr: 1.95e-02, grad_scale: 16.0 2024-10-08 05:17:57,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=113712.0, ans=0.125 2024-10-08 05:18:09,981 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 05:18:32,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=113952.0, ans=0.125 2024-10-08 05:19:05,754 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=114072.0, ans=0.125 2024-10-08 05:19:27,632 INFO [train.py:1136] (0/2) Epoch 12, batch 550, loss[loss=0.1994, simple_loss=0.2891, pruned_loss=0.05484, over 86724.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.3151, pruned_loss=0.06572, over 16084591.11 frames. ], batch size: 229, lr: 1.95e-02, grad_scale: 16.0 2024-10-08 05:19:48,948 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.59 vs. limit=15.0 2024-10-08 05:20:12,310 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.155e+02 5.216e+02 6.028e+02 7.643e+02 1.517e+03, threshold=1.206e+03, percent-clipped=6.0 2024-10-08 05:20:42,969 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.15 vs. limit=10.0 2024-10-08 05:20:51,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=114792.0, ans=10.0 2024-10-08 05:20:54,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=114792.0, ans=0.025 2024-10-08 05:21:06,748 INFO [train.py:1136] (0/2) Epoch 12, batch 600, loss[loss=0.2175, simple_loss=0.3044, pruned_loss=0.06529, over 87377.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.3159, pruned_loss=0.0666, over 16258558.29 frames. ], batch size: 280, lr: 1.95e-02, grad_scale: 16.0 2024-10-08 05:21:17,753 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-10-08 05:21:25,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=115032.0, ans=0.125 2024-10-08 05:21:28,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=115032.0, ans=0.125 2024-10-08 05:22:35,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=115392.0, ans=0.05 2024-10-08 05:22:41,927 INFO [train.py:1136] (0/2) Epoch 12, batch 650, loss[loss=0.2262, simple_loss=0.3169, pruned_loss=0.06773, over 87029.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.3154, pruned_loss=0.06658, over 16450791.64 frames. ], batch size: 548, lr: 1.94e-02, grad_scale: 16.0 2024-10-08 05:22:44,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=115512.0, ans=0.125 2024-10-08 05:22:55,078 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-10-08 05:22:56,165 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=115512.0, ans=0.125 2024-10-08 05:23:16,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115632.0, ans=0.1 2024-10-08 05:23:26,245 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.881e+02 4.923e+02 5.941e+02 6.849e+02 1.480e+03, threshold=1.188e+03, percent-clipped=2.0 2024-10-08 05:23:26,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115752.0, ans=0.1 2024-10-08 05:23:29,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=115752.0, ans=0.2 2024-10-08 05:23:32,067 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.80 vs. limit=15.0 2024-10-08 05:23:37,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=115872.0, ans=0.125 2024-10-08 05:23:39,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=115872.0, ans=0.0 2024-10-08 05:23:47,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.39 vs. limit=15.0 2024-10-08 05:23:49,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=115872.0, ans=0.125 2024-10-08 05:23:55,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=115992.0, ans=0.125 2024-10-08 05:24:09,463 INFO [train.py:1136] (0/2) Epoch 12, batch 700, loss[loss=0.227, simple_loss=0.3223, pruned_loss=0.06589, over 86061.00 frames. ], tot_loss[loss=0.224, simple_loss=0.3153, pruned_loss=0.06638, over 16609510.17 frames. ], batch size: 721, lr: 1.94e-02, grad_scale: 16.0 2024-10-08 05:24:19,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=116112.0, ans=0.04949747468305833 2024-10-08 05:24:33,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-10-08 05:24:43,751 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=116352.0, ans=0.125 2024-10-08 05:25:04,694 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-10-08 05:25:15,408 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=116592.0, ans=0.125 2024-10-08 05:25:33,693 INFO [train.py:1136] (0/2) Epoch 12, batch 750, loss[loss=0.2351, simple_loss=0.3286, pruned_loss=0.07074, over 85929.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.3158, pruned_loss=0.06686, over 16685410.76 frames. ], batch size: 721, lr: 1.94e-02, grad_scale: 16.0 2024-10-08 05:25:33,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116712.0, ans=0.1 2024-10-08 05:25:48,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=116832.0, ans=0.125 2024-10-08 05:26:09,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=116952.0, ans=0.025 2024-10-08 05:26:10,788 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.028e+02 4.758e+02 5.283e+02 6.355e+02 9.475e+02, threshold=1.057e+03, percent-clipped=0.0 2024-10-08 05:26:15,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=116952.0, ans=0.0 2024-10-08 05:26:44,863 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 05:26:49,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=117192.0, ans=0.2 2024-10-08 05:26:54,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=117312.0, ans=0.0 2024-10-08 05:26:55,736 INFO [train.py:1136] (0/2) Epoch 12, batch 800, loss[loss=0.2721, simple_loss=0.3473, pruned_loss=0.09844, over 69581.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.3155, pruned_loss=0.06658, over 16746992.44 frames. ], batch size: 1960, lr: 1.93e-02, grad_scale: 32.0 2024-10-08 05:26:57,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117312.0, ans=0.1 2024-10-08 05:27:22,453 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-12.pt 2024-10-08 05:28:20,757 INFO [train.py:1136] (0/2) Epoch 13, batch 0, loss[loss=0.2104, simple_loss=0.3123, pruned_loss=0.05424, over 87172.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.3123, pruned_loss=0.05424, over 87172.00 frames. ], batch size: 439, lr: 1.86e-02, grad_scale: 32.0 2024-10-08 05:28:20,758 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 05:28:31,608 INFO [train.py:1168] (0/2) Epoch 13, validation: loss=0.1807, simple_loss=0.297, pruned_loss=0.0322, over 1382211.00 frames. 2024-10-08 05:28:31,609 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 05:28:33,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=117504.0, ans=0.125 2024-10-08 05:28:34,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=117504.0, ans=6.0 2024-10-08 05:28:38,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=117504.0, ans=0.125 2024-10-08 05:29:05,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=117624.0, ans=0.0 2024-10-08 05:29:19,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=117744.0, ans=0.125 2024-10-08 05:29:30,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=117864.0, ans=0.125 2024-10-08 05:29:34,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=117864.0, ans=0.0 2024-10-08 05:29:47,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=117984.0, ans=0.125 2024-10-08 05:30:00,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-10-08 05:30:03,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=117984.0, ans=0.125 2024-10-08 05:30:06,117 INFO [train.py:1136] (0/2) Epoch 13, batch 50, loss[loss=0.2112, simple_loss=0.3135, pruned_loss=0.05447, over 87212.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.3149, pruned_loss=0.06497, over 3877138.81 frames. ], batch size: 517, lr: 1.85e-02, grad_scale: 16.0 2024-10-08 05:30:21,980 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.798e+02 4.837e+02 5.648e+02 6.521e+02 8.949e+02, threshold=1.130e+03, percent-clipped=0.0 2024-10-08 05:30:46,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=118344.0, ans=0.125 2024-10-08 05:31:32,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=118584.0, ans=0.0 2024-10-08 05:31:45,040 INFO [train.py:1136] (0/2) Epoch 13, batch 100, loss[loss=0.1994, simple_loss=0.2947, pruned_loss=0.05204, over 87245.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.3119, pruned_loss=0.06394, over 6775268.43 frames. ], batch size: 264, lr: 1.85e-02, grad_scale: 8.0 2024-10-08 05:31:45,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=118704.0, ans=0.1 2024-10-08 05:31:51,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=118704.0, ans=0.125 2024-10-08 05:32:14,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=118824.0, ans=0.125 2024-10-08 05:32:44,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=119064.0, ans=0.0 2024-10-08 05:32:46,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119064.0, ans=0.1 2024-10-08 05:32:52,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=12.0 2024-10-08 05:32:59,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=119184.0, ans=0.0 2024-10-08 05:33:09,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=119184.0, ans=0.04949747468305833 2024-10-08 05:33:17,108 INFO [train.py:1136] (0/2) Epoch 13, batch 150, loss[loss=0.2022, simple_loss=0.2929, pruned_loss=0.05576, over 86141.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.3119, pruned_loss=0.06354, over 9075630.51 frames. ], batch size: 197, lr: 1.85e-02, grad_scale: 8.0 2024-10-08 05:33:36,573 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.259e+02 5.031e+02 6.179e+02 7.510e+02 1.214e+03, threshold=1.236e+03, percent-clipped=2.0 2024-10-08 05:33:59,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=119544.0, ans=0.2 2024-10-08 05:34:03,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=119544.0, ans=0.0 2024-10-08 05:34:50,755 INFO [train.py:1136] (0/2) Epoch 13, batch 200, loss[loss=0.2753, simple_loss=0.3496, pruned_loss=0.1005, over 69175.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.312, pruned_loss=0.06382, over 10842278.60 frames. ], batch size: 1960, lr: 1.84e-02, grad_scale: 8.0 2024-10-08 05:35:02,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=119904.0, ans=0.125 2024-10-08 05:35:10,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=120024.0, ans=0.0 2024-10-08 05:35:25,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=120024.0, ans=0.125 2024-10-08 05:35:26,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=120024.0, ans=0.04949747468305833 2024-10-08 05:35:28,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=15.0 2024-10-08 05:36:25,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=120504.0, ans=0.04949747468305833 2024-10-08 05:36:26,122 INFO [train.py:1136] (0/2) Epoch 13, batch 250, loss[loss=0.2109, simple_loss=0.3042, pruned_loss=0.0588, over 87177.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.3118, pruned_loss=0.06374, over 12253468.47 frames. ], batch size: 330, lr: 1.84e-02, grad_scale: 8.0 2024-10-08 05:36:33,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=120504.0, ans=10.0 2024-10-08 05:36:35,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=120504.0, ans=0.125 2024-10-08 05:36:43,038 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.117e+02 4.970e+02 5.551e+02 6.744e+02 1.307e+03, threshold=1.110e+03, percent-clipped=1.0 2024-10-08 05:36:45,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120624.0, ans=0.1 2024-10-08 05:36:47,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=120624.0, ans=0.125 2024-10-08 05:37:01,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.12 vs. limit=10.0 2024-10-08 05:37:14,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=120744.0, ans=0.0 2024-10-08 05:37:16,019 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=120744.0, ans=0.125 2024-10-08 05:37:21,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=120864.0, ans=0.0 2024-10-08 05:37:31,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=120864.0, ans=0.025 2024-10-08 05:38:00,365 INFO [train.py:1136] (0/2) Epoch 13, batch 300, loss[loss=0.2643, simple_loss=0.3417, pruned_loss=0.0934, over 69622.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.3116, pruned_loss=0.06375, over 13313418.55 frames. ], batch size: 1960, lr: 1.84e-02, grad_scale: 8.0 2024-10-08 05:38:03,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=121104.0, ans=0.125 2024-10-08 05:38:05,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=121104.0, ans=0.125 2024-10-08 05:38:26,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=121224.0, ans=0.0 2024-10-08 05:38:44,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-10-08 05:39:17,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=121584.0, ans=0.025 2024-10-08 05:39:23,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.10 vs. limit=15.0 2024-10-08 05:39:36,364 INFO [train.py:1136] (0/2) Epoch 13, batch 350, loss[loss=0.2105, simple_loss=0.3051, pruned_loss=0.05795, over 87283.00 frames. ], tot_loss[loss=0.22, simple_loss=0.3121, pruned_loss=0.06395, over 14141787.23 frames. ], batch size: 372, lr: 1.84e-02, grad_scale: 8.0 2024-10-08 05:39:53,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=121704.0, ans=0.07 2024-10-08 05:39:56,018 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.025e+02 5.271e+02 6.048e+02 7.108e+02 8.803e+02, threshold=1.210e+03, percent-clipped=0.0 2024-10-08 05:40:24,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=121944.0, ans=0.0 2024-10-08 05:40:34,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=122064.0, ans=0.2 2024-10-08 05:40:41,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=122064.0, ans=0.0 2024-10-08 05:41:08,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=122304.0, ans=0.125 2024-10-08 05:41:09,837 INFO [train.py:1136] (0/2) Epoch 13, batch 400, loss[loss=0.2104, simple_loss=0.3013, pruned_loss=0.05979, over 87213.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.3125, pruned_loss=0.06451, over 14781704.48 frames. ], batch size: 296, lr: 1.83e-02, grad_scale: 16.0 2024-10-08 05:41:12,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=122304.0, ans=0.125 2024-10-08 05:41:17,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122304.0, ans=0.1 2024-10-08 05:41:20,978 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122304.0, ans=0.1 2024-10-08 05:41:22,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=122304.0, ans=0.0 2024-10-08 05:42:05,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122664.0, ans=0.1 2024-10-08 05:42:09,277 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=12.0 2024-10-08 05:42:12,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=122664.0, ans=0.125 2024-10-08 05:42:30,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=122784.0, ans=0.125 2024-10-08 05:42:31,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=12.0 2024-10-08 05:42:32,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=122784.0, ans=0.0 2024-10-08 05:42:45,330 INFO [train.py:1136] (0/2) Epoch 13, batch 450, loss[loss=0.209, simple_loss=0.306, pruned_loss=0.056, over 87302.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.313, pruned_loss=0.06463, over 15266057.78 frames. ], batch size: 415, lr: 1.83e-02, grad_scale: 16.0 2024-10-08 05:43:02,762 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.997e+02 4.668e+02 5.387e+02 6.062e+02 9.597e+02, threshold=1.077e+03, percent-clipped=0.0 2024-10-08 05:43:03,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=123024.0, ans=0.125 2024-10-08 05:44:00,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-10-08 05:44:21,797 INFO [train.py:1136] (0/2) Epoch 13, batch 500, loss[loss=0.225, simple_loss=0.317, pruned_loss=0.06646, over 86379.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.3127, pruned_loss=0.06459, over 15672630.53 frames. ], batch size: 620, lr: 1.83e-02, grad_scale: 16.0 2024-10-08 05:44:22,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=123504.0, ans=0.125 2024-10-08 05:44:24,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.68 vs. limit=22.5 2024-10-08 05:44:39,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=123624.0, ans=0.125 2024-10-08 05:44:40,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.64 vs. limit=22.5 2024-10-08 05:44:54,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=123744.0, ans=0.0 2024-10-08 05:45:03,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=123744.0, ans=0.025 2024-10-08 05:45:20,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=123864.0, ans=0.125 2024-10-08 05:45:47,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=123984.0, ans=0.125 2024-10-08 05:45:52,556 INFO [train.py:1136] (0/2) Epoch 13, batch 550, loss[loss=0.2036, simple_loss=0.3019, pruned_loss=0.05264, over 87464.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.3114, pruned_loss=0.06359, over 16001217.05 frames. ], batch size: 415, lr: 1.82e-02, grad_scale: 16.0 2024-10-08 05:46:14,672 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.930e+02 4.626e+02 5.266e+02 6.275e+02 9.859e+02, threshold=1.053e+03, percent-clipped=0.0 2024-10-08 05:46:37,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=124344.0, ans=0.0 2024-10-08 05:46:45,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124344.0, ans=0.1 2024-10-08 05:47:18,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=124584.0, ans=0.125 2024-10-08 05:47:20,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=124584.0, ans=0.07 2024-10-08 05:47:29,067 INFO [train.py:1136] (0/2) Epoch 13, batch 600, loss[loss=0.2023, simple_loss=0.2912, pruned_loss=0.05666, over 86492.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.3113, pruned_loss=0.06376, over 16240930.34 frames. ], batch size: 213, lr: 1.82e-02, grad_scale: 16.0 2024-10-08 05:47:34,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=124704.0, ans=0.0 2024-10-08 05:47:36,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124704.0, ans=0.1 2024-10-08 05:47:54,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=124824.0, ans=10.0 2024-10-08 05:48:07,136 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124944.0, ans=0.1 2024-10-08 05:48:12,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=124944.0, ans=0.95 2024-10-08 05:48:14,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=124944.0, ans=0.0 2024-10-08 05:48:26,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=125064.0, ans=0.0 2024-10-08 05:48:31,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=125064.0, ans=0.0 2024-10-08 05:48:59,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=125184.0, ans=0.125 2024-10-08 05:49:05,938 INFO [train.py:1136] (0/2) Epoch 13, batch 650, loss[loss=0.2092, simple_loss=0.301, pruned_loss=0.05866, over 87310.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.3117, pruned_loss=0.06401, over 16412898.65 frames. ], batch size: 296, lr: 1.82e-02, grad_scale: 16.0 2024-10-08 05:49:09,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125304.0, ans=0.1 2024-10-08 05:49:22,892 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.943e+02 4.888e+02 5.477e+02 6.826e+02 1.288e+03, threshold=1.095e+03, percent-clipped=2.0 2024-10-08 05:49:41,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=125424.0, ans=0.0 2024-10-08 05:49:41,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=125424.0, ans=0.0 2024-10-08 05:50:05,838 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.70 vs. limit=10.0 2024-10-08 05:50:17,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=6.44 vs. limit=15.0 2024-10-08 05:50:21,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=125784.0, ans=0.0 2024-10-08 05:50:33,632 INFO [train.py:1136] (0/2) Epoch 13, batch 700, loss[loss=0.2402, simple_loss=0.332, pruned_loss=0.07417, over 85382.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.311, pruned_loss=0.06378, over 16570539.79 frames. ], batch size: 866, lr: 1.82e-02, grad_scale: 16.0 2024-10-08 05:50:49,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=126024.0, ans=0.2 2024-10-08 05:50:58,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=126024.0, ans=0.5 2024-10-08 05:51:26,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=126264.0, ans=0.125 2024-10-08 05:51:26,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=126264.0, ans=0.2 2024-10-08 05:51:58,717 INFO [train.py:1136] (0/2) Epoch 13, batch 750, loss[loss=0.2266, simple_loss=0.3191, pruned_loss=0.067, over 86161.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.3116, pruned_loss=0.06382, over 16673088.08 frames. ], batch size: 667, lr: 1.81e-02, grad_scale: 16.0 2024-10-08 05:52:05,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.99 vs. limit=15.0 2024-10-08 05:52:15,931 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.885e+02 4.796e+02 5.799e+02 6.818e+02 1.032e+03, threshold=1.160e+03, percent-clipped=0.0 2024-10-08 05:52:22,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.17 vs. limit=15.0 2024-10-08 05:52:31,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=126744.0, ans=0.0 2024-10-08 05:52:31,740 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=12.0 2024-10-08 05:52:44,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=126744.0, ans=0.0 2024-10-08 05:52:46,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=126744.0, ans=0.125 2024-10-08 05:52:48,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=126864.0, ans=0.125 2024-10-08 05:52:50,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2024-10-08 05:52:53,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=126864.0, ans=0.125 2024-10-08 05:53:06,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=126984.0, ans=0.0 2024-10-08 05:53:14,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=126984.0, ans=0.125 2024-10-08 05:53:18,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=126984.0, ans=0.125 2024-10-08 05:53:22,844 INFO [train.py:1136] (0/2) Epoch 13, batch 800, loss[loss=0.2125, simple_loss=0.3023, pruned_loss=0.06135, over 87115.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.314, pruned_loss=0.06529, over 16718519.23 frames. ], batch size: 280, lr: 1.81e-02, grad_scale: 32.0 2024-10-08 05:53:45,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=127224.0, ans=0.125 2024-10-08 05:53:49,060 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-13.pt 2024-10-08 05:54:40,624 INFO [train.py:1136] (0/2) Epoch 14, batch 0, loss[loss=0.2159, simple_loss=0.3131, pruned_loss=0.05937, over 86366.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.3131, pruned_loss=0.05937, over 86366.00 frames. ], batch size: 668, lr: 1.74e-02, grad_scale: 32.0 2024-10-08 05:54:40,625 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 05:54:51,469 INFO [train.py:1168] (0/2) Epoch 14, validation: loss=0.176, simple_loss=0.2914, pruned_loss=0.03029, over 1382211.00 frames. 2024-10-08 05:54:51,470 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 05:55:18,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=127416.0, ans=0.2 2024-10-08 05:55:57,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.75 vs. limit=15.0 2024-10-08 05:56:15,060 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.833e+02 4.926e+02 6.024e+02 7.499e+02 1.306e+03, threshold=1.205e+03, percent-clipped=1.0 2024-10-08 05:56:24,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2024-10-08 05:56:28,423 INFO [train.py:1136] (0/2) Epoch 14, batch 50, loss[loss=0.2021, simple_loss=0.2987, pruned_loss=0.05277, over 87505.00 frames. ], tot_loss[loss=0.215, simple_loss=0.3086, pruned_loss=0.06074, over 3874632.07 frames. ], batch size: 393, lr: 1.74e-02, grad_scale: 32.0 2024-10-08 05:56:42,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=127896.0, ans=0.0 2024-10-08 05:56:57,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=128016.0, ans=0.07 2024-10-08 05:57:04,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=128136.0, ans=0.2 2024-10-08 05:57:04,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2024-10-08 05:57:13,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=128136.0, ans=0.125 2024-10-08 05:57:15,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=128136.0, ans=0.0 2024-10-08 05:57:41,242 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-10-08 05:57:45,129 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=22.5 2024-10-08 05:58:01,626 INFO [train.py:1136] (0/2) Epoch 14, batch 100, loss[loss=0.227, simple_loss=0.321, pruned_loss=0.06652, over 86323.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.3094, pruned_loss=0.06168, over 6796131.86 frames. ], batch size: 668, lr: 1.74e-02, grad_scale: 32.0 2024-10-08 05:58:48,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=128736.0, ans=0.125 2024-10-08 05:59:03,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.03 vs. limit=22.5 2024-10-08 05:59:18,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=128976.0, ans=0.0 2024-10-08 05:59:25,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=12.0 2024-10-08 05:59:27,716 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.067e+02 5.072e+02 5.975e+02 7.040e+02 9.815e+02, threshold=1.195e+03, percent-clipped=0.0 2024-10-08 05:59:35,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=128976.0, ans=0.125 2024-10-08 05:59:35,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=128976.0, ans=0.02 2024-10-08 05:59:37,859 INFO [train.py:1136] (0/2) Epoch 14, batch 150, loss[loss=0.2264, simple_loss=0.3186, pruned_loss=0.06713, over 86290.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.3099, pruned_loss=0.06194, over 9034651.15 frames. ], batch size: 667, lr: 1.74e-02, grad_scale: 32.0 2024-10-08 05:59:46,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-10-08 06:00:01,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=129216.0, ans=0.025 2024-10-08 06:00:04,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=129216.0, ans=0.125 2024-10-08 06:00:31,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=129336.0, ans=0.0 2024-10-08 06:00:44,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=129456.0, ans=0.125 2024-10-08 06:00:49,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=129456.0, ans=0.07 2024-10-08 06:00:52,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=129576.0, ans=0.125 2024-10-08 06:01:13,451 INFO [train.py:1136] (0/2) Epoch 14, batch 200, loss[loss=0.2057, simple_loss=0.3024, pruned_loss=0.05448, over 87126.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.3084, pruned_loss=0.06105, over 10838234.19 frames. ], batch size: 350, lr: 1.73e-02, grad_scale: 16.0 2024-10-08 06:01:27,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129696.0, ans=0.1 2024-10-08 06:01:27,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=129696.0, ans=0.0 2024-10-08 06:01:55,969 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.08 vs. limit=15.0 2024-10-08 06:02:09,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=130056.0, ans=10.0 2024-10-08 06:02:18,238 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-10-08 06:02:40,486 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.045e+02 5.157e+02 5.878e+02 6.748e+02 1.993e+03, threshold=1.176e+03, percent-clipped=1.0 2024-10-08 06:02:49,841 INFO [train.py:1136] (0/2) Epoch 14, batch 250, loss[loss=0.2004, simple_loss=0.2868, pruned_loss=0.05704, over 85735.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.3082, pruned_loss=0.06111, over 12229567.93 frames. ], batch size: 180, lr: 1.73e-02, grad_scale: 8.0 2024-10-08 06:02:50,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=130296.0, ans=0.0 2024-10-08 06:03:15,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=130416.0, ans=0.0 2024-10-08 06:03:28,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=130536.0, ans=0.025 2024-10-08 06:03:34,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-10-08 06:04:15,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=130776.0, ans=0.125 2024-10-08 06:04:20,309 INFO [train.py:1136] (0/2) Epoch 14, batch 300, loss[loss=0.2161, simple_loss=0.309, pruned_loss=0.06163, over 86883.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.3066, pruned_loss=0.06021, over 13361653.07 frames. ], batch size: 547, lr: 1.73e-02, grad_scale: 8.0 2024-10-08 06:05:21,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131256.0, ans=0.1 2024-10-08 06:05:48,809 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.913e+02 5.012e+02 5.588e+02 6.327e+02 8.335e+02, threshold=1.118e+03, percent-clipped=0.0 2024-10-08 06:05:52,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131376.0, ans=0.1 2024-10-08 06:05:55,606 INFO [train.py:1136] (0/2) Epoch 14, batch 350, loss[loss=0.2623, simple_loss=0.3417, pruned_loss=0.09149, over 69103.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.3058, pruned_loss=0.0599, over 14207577.30 frames. ], batch size: 1960, lr: 1.72e-02, grad_scale: 8.0 2024-10-08 06:05:56,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=131496.0, ans=0.125 2024-10-08 06:06:39,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=131736.0, ans=0.0 2024-10-08 06:06:39,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=131736.0, ans=0.125 2024-10-08 06:06:55,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=131856.0, ans=0.125 2024-10-08 06:07:31,637 INFO [train.py:1136] (0/2) Epoch 14, batch 400, loss[loss=0.2153, simple_loss=0.3096, pruned_loss=0.0605, over 86929.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.3069, pruned_loss=0.06025, over 14851379.43 frames. ], batch size: 583, lr: 1.72e-02, grad_scale: 16.0 2024-10-08 06:08:03,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132216.0, ans=0.125 2024-10-08 06:08:09,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=132216.0, ans=0.2 2024-10-08 06:08:29,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=132456.0, ans=0.025 2024-10-08 06:08:55,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=132576.0, ans=0.125 2024-10-08 06:09:01,677 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.965e+02 4.875e+02 5.666e+02 6.794e+02 8.640e+02, threshold=1.133e+03, percent-clipped=0.0 2024-10-08 06:09:03,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=132576.0, ans=0.0 2024-10-08 06:09:08,450 INFO [train.py:1136] (0/2) Epoch 14, batch 450, loss[loss=0.2015, simple_loss=0.2969, pruned_loss=0.05305, over 87438.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.308, pruned_loss=0.06129, over 15283875.48 frames. ], batch size: 372, lr: 1.72e-02, grad_scale: 16.0 2024-10-08 06:09:17,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=132696.0, ans=0.125 2024-10-08 06:10:00,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.12 vs. limit=15.0 2024-10-08 06:10:03,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=133056.0, ans=0.125 2024-10-08 06:10:08,713 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133056.0, ans=0.1 2024-10-08 06:10:22,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=133176.0, ans=0.125 2024-10-08 06:10:25,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=133176.0, ans=0.5 2024-10-08 06:10:41,313 INFO [train.py:1136] (0/2) Epoch 14, batch 500, loss[loss=0.1983, simple_loss=0.2972, pruned_loss=0.04968, over 87371.00 frames. ], tot_loss[loss=0.215, simple_loss=0.3078, pruned_loss=0.06108, over 15691189.93 frames. ], batch size: 393, lr: 1.72e-02, grad_scale: 16.0 2024-10-08 06:11:01,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=22.5 2024-10-08 06:11:15,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=133416.0, ans=0.0 2024-10-08 06:11:19,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=133536.0, ans=0.125 2024-10-08 06:11:42,959 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-10-08 06:11:44,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=133656.0, ans=0.07 2024-10-08 06:12:07,883 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.030e+02 4.599e+02 5.166e+02 6.081e+02 9.015e+02, threshold=1.033e+03, percent-clipped=0.0 2024-10-08 06:12:14,711 INFO [train.py:1136] (0/2) Epoch 14, batch 550, loss[loss=0.221, simple_loss=0.3131, pruned_loss=0.06444, over 86856.00 frames. ], tot_loss[loss=0.215, simple_loss=0.3078, pruned_loss=0.0611, over 16005770.95 frames. ], batch size: 583, lr: 1.71e-02, grad_scale: 16.0 2024-10-08 06:12:17,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=133896.0, ans=0.125 2024-10-08 06:12:21,035 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 06:12:21,063 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=133896.0, ans=0.125 2024-10-08 06:13:18,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=134256.0, ans=0.95 2024-10-08 06:13:34,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=134376.0, ans=0.1 2024-10-08 06:13:47,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-10-08 06:13:47,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=134376.0, ans=0.125 2024-10-08 06:13:53,356 INFO [train.py:1136] (0/2) Epoch 14, batch 600, loss[loss=0.2273, simple_loss=0.319, pruned_loss=0.06777, over 85813.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.3074, pruned_loss=0.06043, over 16292048.84 frames. ], batch size: 721, lr: 1.71e-02, grad_scale: 16.0 2024-10-08 06:13:59,346 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=134496.0, ans=0.0 2024-10-08 06:14:14,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=134616.0, ans=0.0 2024-10-08 06:14:46,258 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=22.5 2024-10-08 06:15:17,656 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.002e+02 4.687e+02 5.181e+02 5.969e+02 1.167e+03, threshold=1.036e+03, percent-clipped=2.0 2024-10-08 06:15:26,982 INFO [train.py:1136] (0/2) Epoch 14, batch 650, loss[loss=0.2051, simple_loss=0.3046, pruned_loss=0.05275, over 87326.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.3069, pruned_loss=0.06021, over 16492795.95 frames. ], batch size: 464, lr: 1.71e-02, grad_scale: 16.0 2024-10-08 06:15:30,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=135096.0, ans=0.125 2024-10-08 06:16:00,441 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=135216.0, ans=0.0 2024-10-08 06:16:19,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=135456.0, ans=0.125 2024-10-08 06:16:19,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=135456.0, ans=0.0 2024-10-08 06:16:25,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=135456.0, ans=10.0 2024-10-08 06:16:52,480 INFO [train.py:1136] (0/2) Epoch 14, batch 700, loss[loss=0.2046, simple_loss=0.3024, pruned_loss=0.05337, over 87313.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.3071, pruned_loss=0.06032, over 16644183.31 frames. ], batch size: 464, lr: 1.71e-02, grad_scale: 16.0 2024-10-08 06:16:57,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=135696.0, ans=0.025 2024-10-08 06:17:01,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=135696.0, ans=0.125 2024-10-08 06:17:05,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=135696.0, ans=0.125 2024-10-08 06:17:07,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=135816.0, ans=0.125 2024-10-08 06:17:20,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=135816.0, ans=0.125 2024-10-08 06:17:43,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=136056.0, ans=0.125 2024-10-08 06:17:53,161 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=6.0 2024-10-08 06:18:05,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=136176.0, ans=0.125 2024-10-08 06:18:11,463 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.699e+02 4.671e+02 5.236e+02 6.081e+02 8.961e+02, threshold=1.047e+03, percent-clipped=0.0 2024-10-08 06:18:13,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=136176.0, ans=0.07 2024-10-08 06:18:17,679 INFO [train.py:1136] (0/2) Epoch 14, batch 750, loss[loss=0.2185, simple_loss=0.3127, pruned_loss=0.06219, over 86960.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.3076, pruned_loss=0.06057, over 16741477.00 frames. ], batch size: 583, lr: 1.70e-02, grad_scale: 16.0 2024-10-08 06:18:32,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=136416.0, ans=0.125 2024-10-08 06:18:43,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2024-10-08 06:18:57,550 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.51 vs. limit=10.0 2024-10-08 06:19:01,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=136536.0, ans=0.125 2024-10-08 06:19:02,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=136536.0, ans=0.125 2024-10-08 06:19:20,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=136656.0, ans=0.0 2024-10-08 06:19:41,222 INFO [train.py:1136] (0/2) Epoch 14, batch 800, loss[loss=0.2277, simple_loss=0.3217, pruned_loss=0.06689, over 86013.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.3092, pruned_loss=0.06178, over 16749233.71 frames. ], batch size: 721, lr: 1.70e-02, grad_scale: 32.0 2024-10-08 06:20:05,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2024-10-08 06:20:07,208 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-14.pt 2024-10-08 06:21:02,268 INFO [train.py:1136] (0/2) Epoch 15, batch 0, loss[loss=0.2332, simple_loss=0.3249, pruned_loss=0.07072, over 83513.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.3249, pruned_loss=0.07072, over 83513.00 frames. ], batch size: 1078, lr: 1.64e-02, grad_scale: 32.0 2024-10-08 06:21:02,270 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 06:21:13,137 INFO [train.py:1168] (0/2) Epoch 15, validation: loss=0.1743, simple_loss=0.2898, pruned_loss=0.02946, over 1382211.00 frames. 2024-10-08 06:21:13,138 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 06:21:15,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.04 vs. limit=22.5 2024-10-08 06:21:18,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=137088.0, ans=0.2 2024-10-08 06:21:29,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=137088.0, ans=0.125 2024-10-08 06:21:50,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=137328.0, ans=0.09899494936611666 2024-10-08 06:21:50,649 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-10-08 06:22:00,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=137328.0, ans=0.0 2024-10-08 06:22:08,428 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.834e+02 5.010e+02 5.922e+02 7.253e+02 1.467e+03, threshold=1.184e+03, percent-clipped=4.0 2024-10-08 06:22:28,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=137568.0, ans=0.125 2024-10-08 06:22:40,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=137568.0, ans=0.0 2024-10-08 06:22:47,138 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=137688.0, ans=0.0 2024-10-08 06:22:48,325 INFO [train.py:1136] (0/2) Epoch 15, batch 50, loss[loss=0.194, simple_loss=0.2932, pruned_loss=0.04738, over 87296.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.3079, pruned_loss=0.06044, over 3862494.34 frames. ], batch size: 464, lr: 1.64e-02, grad_scale: 32.0 2024-10-08 06:23:13,930 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-10-08 06:23:21,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=137808.0, ans=0.025 2024-10-08 06:23:55,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=138048.0, ans=0.125 2024-10-08 06:24:21,516 INFO [train.py:1136] (0/2) Epoch 15, batch 100, loss[loss=0.1978, simple_loss=0.2959, pruned_loss=0.04986, over 87300.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.3077, pruned_loss=0.06047, over 6795143.68 frames. ], batch size: 393, lr: 1.64e-02, grad_scale: 32.0 2024-10-08 06:24:43,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=138408.0, ans=0.125 2024-10-08 06:24:43,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138408.0, ans=0.1 2024-10-08 06:25:21,098 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.889e+02 4.552e+02 5.161e+02 6.183e+02 1.086e+03, threshold=1.032e+03, percent-clipped=0.0 2024-10-08 06:26:00,766 INFO [train.py:1136] (0/2) Epoch 15, batch 150, loss[loss=0.2607, simple_loss=0.3434, pruned_loss=0.08894, over 78553.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.3086, pruned_loss=0.06143, over 9052426.89 frames. ], batch size: 1493, lr: 1.63e-02, grad_scale: 16.0 2024-10-08 06:26:13,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=138888.0, ans=0.1 2024-10-08 06:26:50,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=139128.0, ans=0.125 2024-10-08 06:26:51,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2024-10-08 06:27:01,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=139248.0, ans=0.0 2024-10-08 06:27:34,561 INFO [train.py:1136] (0/2) Epoch 15, batch 200, loss[loss=0.2081, simple_loss=0.2997, pruned_loss=0.05823, over 87366.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.3073, pruned_loss=0.06055, over 10855611.96 frames. ], batch size: 296, lr: 1.63e-02, grad_scale: 16.0 2024-10-08 06:27:55,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-10-08 06:28:04,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=139608.0, ans=0.0 2024-10-08 06:28:05,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=139608.0, ans=0.95 2024-10-08 06:28:34,993 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.980e+02 4.766e+02 5.570e+02 6.962e+02 1.124e+03, threshold=1.114e+03, percent-clipped=1.0 2024-10-08 06:29:01,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139968.0, ans=0.1 2024-10-08 06:29:07,925 INFO [train.py:1136] (0/2) Epoch 15, batch 250, loss[loss=0.198, simple_loss=0.296, pruned_loss=0.05, over 87374.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.3073, pruned_loss=0.05997, over 12258122.67 frames. ], batch size: 439, lr: 1.63e-02, grad_scale: 16.0 2024-10-08 06:29:21,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140088.0, ans=0.1 2024-10-08 06:29:28,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=140208.0, ans=0.0 2024-10-08 06:29:45,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=140328.0, ans=0.0 2024-10-08 06:29:53,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140328.0, ans=0.125 2024-10-08 06:30:06,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=140448.0, ans=0.125 2024-10-08 06:30:09,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=140448.0, ans=0.1 2024-10-08 06:30:15,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=140448.0, ans=0.125 2024-10-08 06:30:27,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=140568.0, ans=0.2 2024-10-08 06:30:31,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=140568.0, ans=0.0 2024-10-08 06:30:41,315 INFO [train.py:1136] (0/2) Epoch 15, batch 300, loss[loss=0.2559, simple_loss=0.3345, pruned_loss=0.08862, over 69388.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.3067, pruned_loss=0.05959, over 13342444.00 frames. ], batch size: 1960, lr: 1.63e-02, grad_scale: 16.0 2024-10-08 06:30:43,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=140688.0, ans=0.125 2024-10-08 06:31:24,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=140928.0, ans=0.125 2024-10-08 06:31:39,085 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.897e+02 4.641e+02 5.191e+02 5.874e+02 1.110e+03, threshold=1.038e+03, percent-clipped=0.0 2024-10-08 06:31:41,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=141048.0, ans=0.95 2024-10-08 06:31:44,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=141048.0, ans=0.125 2024-10-08 06:31:48,185 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=6.0 2024-10-08 06:32:16,180 INFO [train.py:1136] (0/2) Epoch 15, batch 350, loss[loss=0.2004, simple_loss=0.2943, pruned_loss=0.0532, over 87320.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.306, pruned_loss=0.05927, over 14188960.19 frames. ], batch size: 313, lr: 1.62e-02, grad_scale: 16.0 2024-10-08 06:32:35,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141408.0, ans=0.1 2024-10-08 06:32:44,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=141408.0, ans=0.125 2024-10-08 06:32:55,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=141528.0, ans=0.0 2024-10-08 06:33:03,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=141528.0, ans=0.125 2024-10-08 06:33:46,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=141768.0, ans=0.125 2024-10-08 06:33:48,506 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-10-08 06:33:50,748 INFO [train.py:1136] (0/2) Epoch 15, batch 400, loss[loss=0.2053, simple_loss=0.3038, pruned_loss=0.05338, over 87368.00 frames. ], tot_loss[loss=0.2115, simple_loss=0.3052, pruned_loss=0.05893, over 14847285.15 frames. ], batch size: 393, lr: 1.62e-02, grad_scale: 32.0 2024-10-08 06:34:03,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=141888.0, ans=0.125 2024-10-08 06:34:51,741 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.984e+02 4.714e+02 5.507e+02 6.406e+02 1.023e+03, threshold=1.101e+03, percent-clipped=0.0 2024-10-08 06:34:52,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=142248.0, ans=0.125 2024-10-08 06:35:15,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=142368.0, ans=0.0 2024-10-08 06:35:27,439 INFO [train.py:1136] (0/2) Epoch 15, batch 450, loss[loss=0.2144, simple_loss=0.3149, pruned_loss=0.05694, over 85976.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.3062, pruned_loss=0.05912, over 15321584.23 frames. ], batch size: 721, lr: 1.62e-02, grad_scale: 32.0 2024-10-08 06:35:32,142 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=142488.0, ans=0.2 2024-10-08 06:35:40,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=142488.0, ans=0.2 2024-10-08 06:35:46,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2024-10-08 06:36:10,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=142728.0, ans=0.2 2024-10-08 06:36:11,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.00 vs. limit=12.0 2024-10-08 06:36:42,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=142968.0, ans=0.125 2024-10-08 06:37:02,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2024-10-08 06:37:03,131 INFO [train.py:1136] (0/2) Epoch 15, batch 500, loss[loss=0.2384, simple_loss=0.3298, pruned_loss=0.07349, over 81746.00 frames. ], tot_loss[loss=0.212, simple_loss=0.306, pruned_loss=0.05904, over 15732210.63 frames. ], batch size: 1245, lr: 1.62e-02, grad_scale: 32.0 2024-10-08 06:37:19,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=143088.0, ans=0.125 2024-10-08 06:37:26,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=143208.0, ans=0.125 2024-10-08 06:37:33,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=143208.0, ans=0.125 2024-10-08 06:37:33,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=143208.0, ans=0.125 2024-10-08 06:37:35,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143208.0, ans=0.1 2024-10-08 06:38:00,937 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.905e+02 4.577e+02 5.221e+02 5.899e+02 8.944e+02, threshold=1.044e+03, percent-clipped=0.0 2024-10-08 06:38:01,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=143448.0, ans=0.125 2024-10-08 06:38:06,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=143448.0, ans=0.125 2024-10-08 06:38:08,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=143448.0, ans=0.2 2024-10-08 06:38:11,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=143448.0, ans=0.2 2024-10-08 06:38:20,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=143568.0, ans=0.09899494936611666 2024-10-08 06:38:21,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.44 vs. limit=6.0 2024-10-08 06:38:32,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=143568.0, ans=0.0 2024-10-08 06:38:34,784 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=12.0 2024-10-08 06:38:35,663 INFO [train.py:1136] (0/2) Epoch 15, batch 550, loss[loss=0.1986, simple_loss=0.2959, pruned_loss=0.05063, over 87277.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.3059, pruned_loss=0.05898, over 16054601.89 frames. ], batch size: 415, lr: 1.61e-02, grad_scale: 32.0 2024-10-08 06:38:57,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=15.0 2024-10-08 06:39:03,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=143808.0, ans=0.125 2024-10-08 06:39:13,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=143928.0, ans=0.125 2024-10-08 06:39:19,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.84 vs. limit=12.0 2024-10-08 06:39:22,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=143928.0, ans=0.125 2024-10-08 06:39:23,564 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-12000.pt 2024-10-08 06:40:14,942 INFO [train.py:1136] (0/2) Epoch 15, batch 600, loss[loss=0.2026, simple_loss=0.3044, pruned_loss=0.05041, over 87233.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.3058, pruned_loss=0.05879, over 16287800.40 frames. ], batch size: 517, lr: 1.61e-02, grad_scale: 32.0 2024-10-08 06:40:32,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=144408.0, ans=0.125 2024-10-08 06:40:38,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=144408.0, ans=0.125 2024-10-08 06:40:38,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=144408.0, ans=0.125 2024-10-08 06:40:41,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=144408.0, ans=0.125 2024-10-08 06:40:43,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=144408.0, ans=0.125 2024-10-08 06:41:13,437 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.910e+02 4.484e+02 5.447e+02 6.333e+02 9.982e+02, threshold=1.089e+03, percent-clipped=0.0 2024-10-08 06:41:15,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=22.5 2024-10-08 06:41:36,715 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=144768.0, ans=0.0 2024-10-08 06:41:48,282 INFO [train.py:1136] (0/2) Epoch 15, batch 650, loss[loss=0.1947, simple_loss=0.2954, pruned_loss=0.04696, over 87114.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.3052, pruned_loss=0.05854, over 16470993.73 frames. ], batch size: 517, lr: 1.61e-02, grad_scale: 32.0 2024-10-08 06:42:22,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=145008.0, ans=10.0 2024-10-08 06:42:25,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=145128.0, ans=0.0 2024-10-08 06:42:25,382 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=145128.0, ans=0.2 2024-10-08 06:42:41,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2024-10-08 06:43:16,369 INFO [train.py:1136] (0/2) Epoch 15, batch 700, loss[loss=0.2718, simple_loss=0.3516, pruned_loss=0.09601, over 78659.00 frames. ], tot_loss[loss=0.212, simple_loss=0.3058, pruned_loss=0.05913, over 16588692.07 frames. ], batch size: 1493, lr: 1.61e-02, grad_scale: 32.0 2024-10-08 06:43:18,242 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=145488.0, ans=0.125 2024-10-08 06:43:24,294 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-10-08 06:43:26,560 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=145488.0, ans=0.125 2024-10-08 06:43:28,589 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.82 vs. limit=10.0 2024-10-08 06:43:36,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=145608.0, ans=0.05 2024-10-08 06:43:40,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=145608.0, ans=0.125 2024-10-08 06:43:42,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=145608.0, ans=0.95 2024-10-08 06:43:44,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=145608.0, ans=0.125 2024-10-08 06:43:49,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=145728.0, ans=0.125 2024-10-08 06:44:07,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.777e+02 4.693e+02 5.419e+02 6.208e+02 9.143e+02, threshold=1.084e+03, percent-clipped=0.0 2024-10-08 06:44:20,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=145968.0, ans=0.0 2024-10-08 06:44:39,042 INFO [train.py:1136] (0/2) Epoch 15, batch 750, loss[loss=0.2012, simple_loss=0.2929, pruned_loss=0.05472, over 87098.00 frames. ], tot_loss[loss=0.2115, simple_loss=0.3054, pruned_loss=0.05882, over 16717056.66 frames. ], batch size: 330, lr: 1.60e-02, grad_scale: 32.0 2024-10-08 06:44:44,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=146088.0, ans=0.125 2024-10-08 06:44:45,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=146088.0, ans=0.025 2024-10-08 06:44:50,357 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=146088.0, ans=10.0 2024-10-08 06:45:07,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=146208.0, ans=0.0 2024-10-08 06:45:08,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=146208.0, ans=0.025 2024-10-08 06:45:10,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146328.0, ans=0.1 2024-10-08 06:45:46,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=146568.0, ans=0.125 2024-10-08 06:45:52,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=146568.0, ans=0.1 2024-10-08 06:45:58,092 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.05 vs. limit=15.0 2024-10-08 06:46:00,425 INFO [train.py:1136] (0/2) Epoch 15, batch 800, loss[loss=0.2621, simple_loss=0.3395, pruned_loss=0.09236, over 69496.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.3052, pruned_loss=0.05862, over 16780705.42 frames. ], batch size: 1960, lr: 1.60e-02, grad_scale: 32.0 2024-10-08 06:46:13,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=12.0 2024-10-08 06:46:26,877 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-15.pt 2024-10-08 06:47:21,579 INFO [train.py:1136] (0/2) Epoch 16, batch 0, loss[loss=0.2303, simple_loss=0.3246, pruned_loss=0.06803, over 85507.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.3246, pruned_loss=0.06803, over 85507.00 frames. ], batch size: 787, lr: 1.55e-02, grad_scale: 32.0 2024-10-08 06:47:21,580 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 06:47:33,065 INFO [train.py:1168] (0/2) Epoch 16, validation: loss=0.1743, simple_loss=0.2899, pruned_loss=0.02933, over 1382211.00 frames. 2024-10-08 06:47:33,066 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 06:47:43,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=146880.0, ans=0.1 2024-10-08 06:47:46,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=146880.0, ans=0.125 2024-10-08 06:47:52,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=147000.0, ans=0.125 2024-10-08 06:48:04,001 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.915e+02 4.762e+02 5.712e+02 6.645e+02 8.988e+02, threshold=1.142e+03, percent-clipped=0.0 2024-10-08 06:48:15,195 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.35 vs. limit=22.5 2024-10-08 06:48:59,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=147360.0, ans=0.125 2024-10-08 06:49:08,243 INFO [train.py:1136] (0/2) Epoch 16, batch 50, loss[loss=0.2013, simple_loss=0.301, pruned_loss=0.05082, over 87392.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.304, pruned_loss=0.05712, over 3872235.52 frames. ], batch size: 393, lr: 1.55e-02, grad_scale: 32.0 2024-10-08 06:49:58,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=147720.0, ans=0.0 2024-10-08 06:50:22,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=147960.0, ans=0.025 2024-10-08 06:50:24,544 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=147960.0, ans=0.125 2024-10-08 06:50:42,383 INFO [train.py:1136] (0/2) Epoch 16, batch 100, loss[loss=0.1879, simple_loss=0.2896, pruned_loss=0.04306, over 87213.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.3061, pruned_loss=0.05787, over 6826088.10 frames. ], batch size: 517, lr: 1.55e-02, grad_scale: 32.0 2024-10-08 06:51:15,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=148200.0, ans=0.125 2024-10-08 06:51:16,462 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.740e+02 4.511e+02 5.243e+02 6.087e+02 9.290e+02, threshold=1.049e+03, percent-clipped=0.0 2024-10-08 06:51:36,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=148320.0, ans=0.125 2024-10-08 06:51:48,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-10-08 06:52:16,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=148560.0, ans=0.125 2024-10-08 06:52:19,690 INFO [train.py:1136] (0/2) Epoch 16, batch 150, loss[loss=0.2256, simple_loss=0.3197, pruned_loss=0.06573, over 85299.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.3069, pruned_loss=0.05933, over 9062215.86 frames. ], batch size: 866, lr: 1.54e-02, grad_scale: 16.0 2024-10-08 06:52:41,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=148800.0, ans=0.125 2024-10-08 06:52:51,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=148800.0, ans=0.0 2024-10-08 06:52:57,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-10-08 06:53:01,384 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.80 vs. limit=12.0 2024-10-08 06:53:02,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=148920.0, ans=0.2 2024-10-08 06:53:45,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149160.0, ans=0.125 2024-10-08 06:53:53,452 INFO [train.py:1136] (0/2) Epoch 16, batch 200, loss[loss=0.207, simple_loss=0.2974, pruned_loss=0.05835, over 87209.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.3064, pruned_loss=0.05901, over 10836063.26 frames. ], batch size: 296, lr: 1.54e-02, grad_scale: 16.0 2024-10-08 06:54:03,358 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=149280.0, ans=0.125 2024-10-08 06:54:12,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=149400.0, ans=0.125 2024-10-08 06:54:29,409 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 4.102e+02 4.485e+02 5.097e+02 5.830e+02 9.374e+02, threshold=1.019e+03, percent-clipped=0.0 2024-10-08 06:55:10,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=12.0 2024-10-08 06:55:15,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=149760.0, ans=0.0 2024-10-08 06:55:22,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=149760.0, ans=0.07 2024-10-08 06:55:30,287 INFO [train.py:1136] (0/2) Epoch 16, batch 250, loss[loss=0.2257, simple_loss=0.3234, pruned_loss=0.06404, over 85678.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.306, pruned_loss=0.05905, over 12164888.24 frames. ], batch size: 787, lr: 1.54e-02, grad_scale: 16.0 2024-10-08 06:55:43,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=149880.0, ans=0.0 2024-10-08 06:55:51,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=150000.0, ans=10.0 2024-10-08 06:56:00,418 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 06:56:01,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=150000.0, ans=0.125 2024-10-08 06:56:16,145 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 06:56:50,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=150360.0, ans=0.025 2024-10-08 06:56:52,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=150360.0, ans=0.2 2024-10-08 06:56:57,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=150360.0, ans=0.2 2024-10-08 06:57:04,966 INFO [train.py:1136] (0/2) Epoch 16, batch 300, loss[loss=0.2136, simple_loss=0.3119, pruned_loss=0.0576, over 85960.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.3039, pruned_loss=0.05766, over 13288007.24 frames. ], batch size: 721, lr: 1.54e-02, grad_scale: 16.0 2024-10-08 06:57:38,411 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.570e+02 4.641e+02 5.223e+02 5.936e+02 9.475e+02, threshold=1.045e+03, percent-clipped=0.0 2024-10-08 06:57:44,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=15.0 2024-10-08 06:58:17,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=150960.0, ans=0.5 2024-10-08 06:58:21,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=150960.0, ans=0.125 2024-10-08 06:58:29,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=150960.0, ans=0.0 2024-10-08 06:58:37,956 INFO [train.py:1136] (0/2) Epoch 16, batch 350, loss[loss=0.1955, simple_loss=0.2874, pruned_loss=0.05175, over 87136.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.3043, pruned_loss=0.05804, over 14103091.07 frames. ], batch size: 280, lr: 1.53e-02, grad_scale: 16.0 2024-10-08 06:58:43,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=151080.0, ans=0.0 2024-10-08 06:59:07,104 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=12.0 2024-10-08 06:59:44,205 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 07:00:13,936 INFO [train.py:1136] (0/2) Epoch 16, batch 400, loss[loss=0.2412, simple_loss=0.334, pruned_loss=0.07416, over 81963.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.3047, pruned_loss=0.05809, over 14753688.87 frames. ], batch size: 1245, lr: 1.53e-02, grad_scale: 32.0 2024-10-08 07:00:47,550 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.766e+02 5.089e+02 6.206e+02 7.639e+02 1.398e+03, threshold=1.241e+03, percent-clipped=5.0 2024-10-08 07:01:05,471 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=151920.0, ans=0.125 2024-10-08 07:01:08,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=152040.0, ans=0.2 2024-10-08 07:01:15,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=152040.0, ans=0.125 2024-10-08 07:01:34,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=152160.0, ans=0.0 2024-10-08 07:01:36,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=152160.0, ans=0.125 2024-10-08 07:01:41,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=152160.0, ans=0.125 2024-10-08 07:01:46,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=152160.0, ans=0.0 2024-10-08 07:01:48,875 INFO [train.py:1136] (0/2) Epoch 16, batch 450, loss[loss=0.2009, simple_loss=0.2947, pruned_loss=0.05352, over 87219.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.3042, pruned_loss=0.05778, over 15266268.97 frames. ], batch size: 350, lr: 1.53e-02, grad_scale: 32.0 2024-10-08 07:02:38,810 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 07:03:01,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=152760.0, ans=0.125 2024-10-08 07:03:06,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=152760.0, ans=0.2 2024-10-08 07:03:19,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-10-08 07:03:21,926 INFO [train.py:1136] (0/2) Epoch 16, batch 500, loss[loss=0.1979, simple_loss=0.296, pruned_loss=0.04988, over 87419.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.304, pruned_loss=0.05745, over 15668968.85 frames. ], batch size: 393, lr: 1.53e-02, grad_scale: 32.0 2024-10-08 07:03:40,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=153000.0, ans=0.0 2024-10-08 07:03:55,265 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.733e+02 4.429e+02 4.882e+02 5.835e+02 8.480e+02, threshold=9.764e+02, percent-clipped=0.0 2024-10-08 07:04:37,454 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.33 vs. limit=10.0 2024-10-08 07:04:43,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=153360.0, ans=0.025 2024-10-08 07:04:43,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=153360.0, ans=0.125 2024-10-08 07:04:57,700 INFO [train.py:1136] (0/2) Epoch 16, batch 550, loss[loss=0.1908, simple_loss=0.2819, pruned_loss=0.04983, over 85866.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.3038, pruned_loss=0.05738, over 15990723.37 frames. ], batch size: 180, lr: 1.53e-02, grad_scale: 32.0 2024-10-08 07:04:58,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2024-10-08 07:05:34,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=153720.0, ans=0.0 2024-10-08 07:05:53,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=153840.0, ans=0.025 2024-10-08 07:06:04,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=153840.0, ans=0.2 2024-10-08 07:06:34,428 INFO [train.py:1136] (0/2) Epoch 16, batch 600, loss[loss=0.2326, simple_loss=0.3264, pruned_loss=0.06938, over 84449.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.3049, pruned_loss=0.05794, over 16210737.95 frames. ], batch size: 958, lr: 1.52e-02, grad_scale: 16.0 2024-10-08 07:06:51,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=154080.0, ans=0.125 2024-10-08 07:06:56,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=154200.0, ans=0.0 2024-10-08 07:07:10,101 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.911e+02 4.911e+02 5.810e+02 6.755e+02 9.548e+02, threshold=1.162e+03, percent-clipped=0.0 2024-10-08 07:08:05,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=154560.0, ans=0.125 2024-10-08 07:08:05,945 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2024-10-08 07:08:10,353 INFO [train.py:1136] (0/2) Epoch 16, batch 650, loss[loss=0.2006, simple_loss=0.2852, pruned_loss=0.05805, over 85624.00 frames. ], tot_loss[loss=0.2101, simple_loss=0.3047, pruned_loss=0.05774, over 16401948.15 frames. ], batch size: 180, lr: 1.52e-02, grad_scale: 16.0 2024-10-08 07:08:11,344 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.06 vs. limit=15.0 2024-10-08 07:08:15,505 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=154680.0, ans=0.125 2024-10-08 07:08:31,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=154800.0, ans=0.125 2024-10-08 07:08:31,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154800.0, ans=0.125 2024-10-08 07:08:45,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=154920.0, ans=0.125 2024-10-08 07:08:47,730 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=154920.0, ans=0.025 2024-10-08 07:09:26,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=155160.0, ans=0.0 2024-10-08 07:09:32,036 INFO [train.py:1136] (0/2) Epoch 16, batch 700, loss[loss=0.196, simple_loss=0.2969, pruned_loss=0.04762, over 87334.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.3041, pruned_loss=0.05748, over 16529978.74 frames. ], batch size: 490, lr: 1.52e-02, grad_scale: 16.0 2024-10-08 07:09:50,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=155400.0, ans=0.0 2024-10-08 07:09:51,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155400.0, ans=0.1 2024-10-08 07:09:53,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-10-08 07:10:05,690 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.899e+02 4.693e+02 5.317e+02 5.866e+02 9.175e+02, threshold=1.063e+03, percent-clipped=0.0 2024-10-08 07:10:10,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=155520.0, ans=0.125 2024-10-08 07:10:48,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=155760.0, ans=0.125 2024-10-08 07:10:51,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=155760.0, ans=0.0 2024-10-08 07:10:55,966 INFO [train.py:1136] (0/2) Epoch 16, batch 750, loss[loss=0.1981, simple_loss=0.2961, pruned_loss=0.05008, over 87257.00 frames. ], tot_loss[loss=0.21, simple_loss=0.3045, pruned_loss=0.05771, over 16644178.57 frames. ], batch size: 490, lr: 1.52e-02, grad_scale: 16.0 2024-10-08 07:10:59,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=155880.0, ans=0.125 2024-10-08 07:11:15,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=156000.0, ans=0.125 2024-10-08 07:11:15,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2024-10-08 07:11:21,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=156000.0, ans=0.2 2024-10-08 07:11:21,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=156000.0, ans=0.07 2024-10-08 07:11:42,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=156120.0, ans=0.125 2024-10-08 07:12:18,031 INFO [train.py:1136] (0/2) Epoch 16, batch 800, loss[loss=0.1953, simple_loss=0.2867, pruned_loss=0.05193, over 87209.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.3046, pruned_loss=0.05787, over 16739332.59 frames. ], batch size: 280, lr: 1.51e-02, grad_scale: 32.0 2024-10-08 07:12:28,247 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=156480.0, ans=0.125 2024-10-08 07:12:32,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=156600.0, ans=0.125 2024-10-08 07:12:33,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-10-08 07:12:43,461 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-16.pt 2024-10-08 07:13:25,830 INFO [train.py:1136] (0/2) Epoch 17, batch 0, loss[loss=0.2169, simple_loss=0.3102, pruned_loss=0.06175, over 86833.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.3102, pruned_loss=0.06175, over 86833.00 frames. ], batch size: 547, lr: 1.47e-02, grad_scale: 32.0 2024-10-08 07:13:25,832 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 07:13:36,702 INFO [train.py:1168] (0/2) Epoch 17, validation: loss=0.1716, simple_loss=0.2858, pruned_loss=0.02868, over 1382211.00 frames. 2024-10-08 07:13:36,702 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 07:13:41,778 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.881e+02 4.707e+02 5.416e+02 6.607e+02 8.807e+02, threshold=1.083e+03, percent-clipped=0.0 2024-10-08 07:13:56,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=156792.0, ans=0.0 2024-10-08 07:14:19,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=156912.0, ans=0.0 2024-10-08 07:14:19,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=156912.0, ans=0.5 2024-10-08 07:14:21,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=156912.0, ans=0.125 2024-10-08 07:14:23,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=156912.0, ans=0.0 2024-10-08 07:14:55,889 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-10-08 07:15:12,898 INFO [train.py:1136] (0/2) Epoch 17, batch 50, loss[loss=0.197, simple_loss=0.2925, pruned_loss=0.05075, over 87145.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.3005, pruned_loss=0.05557, over 3870222.78 frames. ], batch size: 330, lr: 1.47e-02, grad_scale: 16.0 2024-10-08 07:15:34,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=157392.0, ans=0.0 2024-10-08 07:15:42,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=157392.0, ans=0.0 2024-10-08 07:15:53,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=157512.0, ans=0.2 2024-10-08 07:16:08,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=157632.0, ans=0.125 2024-10-08 07:16:24,578 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.18 vs. limit=12.0 2024-10-08 07:16:41,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=157752.0, ans=0.125 2024-10-08 07:16:43,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=157752.0, ans=0.07 2024-10-08 07:16:49,732 INFO [train.py:1136] (0/2) Epoch 17, batch 100, loss[loss=0.2122, simple_loss=0.3094, pruned_loss=0.05747, over 86518.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.3015, pruned_loss=0.05602, over 6802092.72 frames. ], batch size: 621, lr: 1.46e-02, grad_scale: 16.0 2024-10-08 07:16:52,753 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2024-10-08 07:16:56,399 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.690e+02 4.551e+02 5.114e+02 5.862e+02 9.907e+02, threshold=1.023e+03, percent-clipped=0.0 2024-10-08 07:17:12,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=157992.0, ans=0.2 2024-10-08 07:18:25,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=158472.0, ans=15.0 2024-10-08 07:18:26,005 INFO [train.py:1136] (0/2) Epoch 17, batch 150, loss[loss=0.189, simple_loss=0.2827, pruned_loss=0.04767, over 87292.00 frames. ], tot_loss[loss=0.2057, simple_loss=0.3006, pruned_loss=0.05545, over 9088869.09 frames. ], batch size: 313, lr: 1.46e-02, grad_scale: 16.0 2024-10-08 07:18:38,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=158472.0, ans=0.0 2024-10-08 07:19:17,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=158712.0, ans=0.0 2024-10-08 07:19:31,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158832.0, ans=0.1 2024-10-08 07:19:37,735 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.82 vs. limit=5.0 2024-10-08 07:19:46,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=158952.0, ans=0.125 2024-10-08 07:20:00,109 INFO [train.py:1136] (0/2) Epoch 17, batch 200, loss[loss=0.1925, simple_loss=0.2897, pruned_loss=0.04764, over 87299.00 frames. ], tot_loss[loss=0.207, simple_loss=0.3022, pruned_loss=0.05587, over 10858934.69 frames. ], batch size: 372, lr: 1.46e-02, grad_scale: 16.0 2024-10-08 07:20:06,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=159072.0, ans=12.0 2024-10-08 07:20:07,052 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.924e+02 4.761e+02 5.215e+02 6.058e+02 1.037e+03, threshold=1.043e+03, percent-clipped=1.0 2024-10-08 07:20:16,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=159192.0, ans=0.125 2024-10-08 07:20:17,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=159192.0, ans=0.125 2024-10-08 07:20:24,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=159192.0, ans=0.1 2024-10-08 07:20:25,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=159192.0, ans=6.0 2024-10-08 07:20:33,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=159312.0, ans=0.0 2024-10-08 07:20:44,087 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=22.5 2024-10-08 07:21:25,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=159552.0, ans=0.125 2024-10-08 07:21:26,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.10 vs. limit=22.5 2024-10-08 07:21:33,268 INFO [train.py:1136] (0/2) Epoch 17, batch 250, loss[loss=0.1904, simple_loss=0.2894, pruned_loss=0.04571, over 87390.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.303, pruned_loss=0.05628, over 12248270.64 frames. ], batch size: 415, lr: 1.46e-02, grad_scale: 16.0 2024-10-08 07:21:34,513 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.64 vs. limit=22.5 2024-10-08 07:21:39,166 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=159672.0, ans=0.125 2024-10-08 07:21:39,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2024-10-08 07:21:53,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.22 vs. limit=12.0 2024-10-08 07:22:47,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=160152.0, ans=0.0 2024-10-08 07:23:04,319 INFO [train.py:1136] (0/2) Epoch 17, batch 300, loss[loss=0.2157, simple_loss=0.3144, pruned_loss=0.05849, over 85264.00 frames. ], tot_loss[loss=0.207, simple_loss=0.3022, pruned_loss=0.05583, over 13328557.71 frames. ], batch size: 866, lr: 1.46e-02, grad_scale: 16.0 2024-10-08 07:23:13,498 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.835e+02 4.742e+02 5.222e+02 6.140e+02 7.816e+02, threshold=1.044e+03, percent-clipped=0.0 2024-10-08 07:23:19,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=160272.0, ans=0.125 2024-10-08 07:23:36,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=160392.0, ans=0.0 2024-10-08 07:24:01,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=160632.0, ans=0.0 2024-10-08 07:24:40,176 INFO [train.py:1136] (0/2) Epoch 17, batch 350, loss[loss=0.1994, simple_loss=0.2981, pruned_loss=0.05038, over 87287.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.302, pruned_loss=0.05571, over 14157629.00 frames. ], batch size: 464, lr: 1.45e-02, grad_scale: 16.0 2024-10-08 07:25:11,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=160992.0, ans=0.125 2024-10-08 07:25:51,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=161232.0, ans=0.2 2024-10-08 07:26:03,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=161352.0, ans=0.125 2024-10-08 07:26:13,983 INFO [train.py:1136] (0/2) Epoch 17, batch 400, loss[loss=0.1959, simple_loss=0.2952, pruned_loss=0.04826, over 87274.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.3027, pruned_loss=0.05605, over 14804257.12 frames. ], batch size: 439, lr: 1.45e-02, grad_scale: 32.0 2024-10-08 07:26:16,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=161472.0, ans=0.125 2024-10-08 07:26:16,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=161472.0, ans=0.125 2024-10-08 07:26:21,204 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.825e+02 4.860e+02 5.747e+02 6.934e+02 1.055e+03, threshold=1.149e+03, percent-clipped=1.0 2024-10-08 07:26:39,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=161592.0, ans=0.125 2024-10-08 07:27:32,476 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=161952.0, ans=0.125 2024-10-08 07:27:47,295 INFO [train.py:1136] (0/2) Epoch 17, batch 450, loss[loss=0.1939, simple_loss=0.2793, pruned_loss=0.05426, over 85845.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.3025, pruned_loss=0.05619, over 15298662.11 frames. ], batch size: 180, lr: 1.45e-02, grad_scale: 32.0 2024-10-08 07:28:01,984 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.27 vs. limit=10.0 2024-10-08 07:28:06,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162192.0, ans=0.1 2024-10-08 07:28:06,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=162192.0, ans=0.125 2024-10-08 07:28:26,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2024-10-08 07:29:22,289 INFO [train.py:1136] (0/2) Epoch 17, batch 500, loss[loss=0.1917, simple_loss=0.2844, pruned_loss=0.04954, over 87371.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.3021, pruned_loss=0.05605, over 15722360.03 frames. ], batch size: 296, lr: 1.45e-02, grad_scale: 32.0 2024-10-08 07:29:24,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=162672.0, ans=0.0 2024-10-08 07:29:29,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=162672.0, ans=10.0 2024-10-08 07:29:30,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.684e+02 4.334e+02 4.776e+02 5.536e+02 8.909e+02, threshold=9.553e+02, percent-clipped=0.0 2024-10-08 07:29:34,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=162672.0, ans=0.025 2024-10-08 07:29:55,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=162912.0, ans=0.125 2024-10-08 07:30:24,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=163032.0, ans=0.0 2024-10-08 07:30:53,009 INFO [train.py:1136] (0/2) Epoch 17, batch 550, loss[loss=0.1901, simple_loss=0.2821, pruned_loss=0.04901, over 86617.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.3019, pruned_loss=0.05589, over 16036830.05 frames. ], batch size: 213, lr: 1.45e-02, grad_scale: 16.0 2024-10-08 07:30:55,254 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 07:31:01,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=163272.0, ans=0.0 2024-10-08 07:31:20,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=163392.0, ans=0.1 2024-10-08 07:31:34,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=163512.0, ans=0.2 2024-10-08 07:32:27,178 INFO [train.py:1136] (0/2) Epoch 17, batch 600, loss[loss=0.1976, simple_loss=0.2961, pruned_loss=0.0495, over 87383.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.301, pruned_loss=0.05531, over 16282839.80 frames. ], batch size: 372, lr: 1.44e-02, grad_scale: 16.0 2024-10-08 07:32:35,806 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.671e+02 4.497e+02 5.024e+02 5.899e+02 8.276e+02, threshold=1.005e+03, percent-clipped=0.0 2024-10-08 07:33:00,004 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 07:33:01,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=163992.0, ans=0.125 2024-10-08 07:33:03,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=15.0 2024-10-08 07:33:59,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=164352.0, ans=0.125 2024-10-08 07:34:01,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=22.5 2024-10-08 07:34:02,632 INFO [train.py:1136] (0/2) Epoch 17, batch 650, loss[loss=0.1905, simple_loss=0.2937, pruned_loss=0.04369, over 87313.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.3007, pruned_loss=0.05526, over 16485508.71 frames. ], batch size: 490, lr: 1.44e-02, grad_scale: 16.0 2024-10-08 07:34:16,772 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2024-10-08 07:34:25,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=164592.0, ans=0.125 2024-10-08 07:34:47,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.29 vs. limit=12.0 2024-10-08 07:35:16,018 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=12.0 2024-10-08 07:35:18,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=164952.0, ans=0.125 2024-10-08 07:35:23,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=164952.0, ans=0.1 2024-10-08 07:35:26,726 INFO [train.py:1136] (0/2) Epoch 17, batch 700, loss[loss=0.1931, simple_loss=0.2806, pruned_loss=0.0528, over 85531.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.3, pruned_loss=0.05489, over 16652403.85 frames. ], batch size: 180, lr: 1.44e-02, grad_scale: 16.0 2024-10-08 07:35:34,688 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.880e+02 4.315e+02 5.113e+02 5.996e+02 1.387e+03, threshold=1.023e+03, percent-clipped=2.0 2024-10-08 07:36:12,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.13 vs. limit=15.0 2024-10-08 07:36:47,382 INFO [train.py:1136] (0/2) Epoch 17, batch 750, loss[loss=0.1944, simple_loss=0.2945, pruned_loss=0.04714, over 87288.00 frames. ], tot_loss[loss=0.2057, simple_loss=0.3005, pruned_loss=0.05539, over 16718940.17 frames. ], batch size: 415, lr: 1.44e-02, grad_scale: 16.0 2024-10-08 07:36:54,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.73 vs. limit=6.0 2024-10-08 07:37:08,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165792.0, ans=0.1 2024-10-08 07:38:10,329 INFO [train.py:1136] (0/2) Epoch 17, batch 800, loss[loss=0.1878, simple_loss=0.2817, pruned_loss=0.047, over 86773.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.3007, pruned_loss=0.05579, over 16708514.66 frames. ], batch size: 246, lr: 1.44e-02, grad_scale: 32.0 2024-10-08 07:38:18,325 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.801e+02 4.619e+02 5.485e+02 6.343e+02 1.407e+03, threshold=1.097e+03, percent-clipped=2.0 2024-10-08 07:38:20,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=166272.0, ans=0.1 2024-10-08 07:38:22,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=166272.0, ans=10.0 2024-10-08 07:38:36,513 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-17.pt 2024-10-08 07:39:30,344 INFO [train.py:1136] (0/2) Epoch 18, batch 0, loss[loss=0.1962, simple_loss=0.2964, pruned_loss=0.04803, over 87277.00 frames. ], tot_loss[loss=0.1962, simple_loss=0.2964, pruned_loss=0.04803, over 87277.00 frames. ], batch size: 415, lr: 1.39e-02, grad_scale: 32.0 2024-10-08 07:39:30,345 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 07:39:41,192 INFO [train.py:1168] (0/2) Epoch 18, validation: loss=0.1724, simple_loss=0.2862, pruned_loss=0.02933, over 1382211.00 frames. 2024-10-08 07:39:41,193 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 07:39:50,710 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=15.0 2024-10-08 07:40:16,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2024-10-08 07:40:21,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=166704.0, ans=0.0 2024-10-08 07:40:49,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=166824.0, ans=0.0 2024-10-08 07:40:58,156 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-10-08 07:41:15,248 INFO [train.py:1136] (0/2) Epoch 18, batch 50, loss[loss=0.2148, simple_loss=0.3087, pruned_loss=0.06044, over 86455.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.3029, pruned_loss=0.05563, over 3880241.59 frames. ], batch size: 620, lr: 1.39e-02, grad_scale: 32.0 2024-10-08 07:41:51,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=167304.0, ans=0.125 2024-10-08 07:42:04,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=167304.0, ans=0.125 2024-10-08 07:42:13,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=167424.0, ans=0.0 2024-10-08 07:42:31,138 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.747e+02 4.717e+02 5.295e+02 6.120e+02 8.338e+02, threshold=1.059e+03, percent-clipped=0.0 2024-10-08 07:42:39,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=167544.0, ans=0.0 2024-10-08 07:42:52,685 INFO [train.py:1136] (0/2) Epoch 18, batch 100, loss[loss=0.1882, simple_loss=0.2818, pruned_loss=0.04731, over 86471.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.3016, pruned_loss=0.0556, over 6787429.15 frames. ], batch size: 213, lr: 1.39e-02, grad_scale: 32.0 2024-10-08 07:43:02,351 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-10-08 07:43:09,076 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-10-08 07:43:20,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=167784.0, ans=0.125 2024-10-08 07:43:38,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=167904.0, ans=0.035 2024-10-08 07:44:26,575 INFO [train.py:1136] (0/2) Epoch 18, batch 150, loss[loss=0.1875, simple_loss=0.2907, pruned_loss=0.04217, over 87261.00 frames. ], tot_loss[loss=0.205, simple_loss=0.3007, pruned_loss=0.05468, over 9071518.03 frames. ], batch size: 439, lr: 1.39e-02, grad_scale: 16.0 2024-10-08 07:44:27,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168264.0, ans=0.1 2024-10-08 07:44:31,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=168264.0, ans=0.0 2024-10-08 07:44:47,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2024-10-08 07:44:55,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=168384.0, ans=0.125 2024-10-08 07:44:56,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=168384.0, ans=0.0 2024-10-08 07:44:57,069 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 07:45:06,339 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=22.5 2024-10-08 07:45:16,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=168504.0, ans=0.0 2024-10-08 07:45:24,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=168624.0, ans=0.09899494936611666 2024-10-08 07:45:24,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=168624.0, ans=0.125 2024-10-08 07:45:42,532 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.747e+02 4.485e+02 5.279e+02 5.744e+02 8.024e+02, threshold=1.056e+03, percent-clipped=0.0 2024-10-08 07:45:43,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=168744.0, ans=0.0 2024-10-08 07:46:01,998 INFO [train.py:1136] (0/2) Epoch 18, batch 200, loss[loss=0.1885, simple_loss=0.2891, pruned_loss=0.04397, over 87315.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2999, pruned_loss=0.05448, over 10841325.33 frames. ], batch size: 439, lr: 1.39e-02, grad_scale: 16.0 2024-10-08 07:47:05,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=169224.0, ans=0.125 2024-10-08 07:47:19,281 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 07:47:21,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=169344.0, ans=0.125 2024-10-08 07:47:21,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=169344.0, ans=0.125 2024-10-08 07:47:36,729 INFO [train.py:1136] (0/2) Epoch 18, batch 250, loss[loss=0.1901, simple_loss=0.2887, pruned_loss=0.04576, over 87243.00 frames. ], tot_loss[loss=0.204, simple_loss=0.2997, pruned_loss=0.05416, over 12216122.07 frames. ], batch size: 439, lr: 1.38e-02, grad_scale: 16.0 2024-10-08 07:48:04,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=169584.0, ans=0.125 2024-10-08 07:48:08,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=169584.0, ans=0.95 2024-10-08 07:48:13,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=169584.0, ans=0.025 2024-10-08 07:48:28,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=169704.0, ans=0.07 2024-10-08 07:48:28,827 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2024-10-08 07:48:55,050 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.691e+02 4.648e+02 5.278e+02 5.998e+02 1.204e+03, threshold=1.056e+03, percent-clipped=1.0 2024-10-08 07:49:00,811 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=169944.0, ans=0.09899494936611666 2024-10-08 07:49:12,648 INFO [train.py:1136] (0/2) Epoch 18, batch 300, loss[loss=0.183, simple_loss=0.2826, pruned_loss=0.04169, over 87312.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2999, pruned_loss=0.05413, over 13281870.87 frames. ], batch size: 439, lr: 1.38e-02, grad_scale: 16.0 2024-10-08 07:49:26,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-10-08 07:50:24,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.46 vs. limit=10.0 2024-10-08 07:50:50,995 INFO [train.py:1136] (0/2) Epoch 18, batch 350, loss[loss=0.255, simple_loss=0.3414, pruned_loss=0.08428, over 78734.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.3004, pruned_loss=0.05498, over 14047241.71 frames. ], batch size: 1493, lr: 1.38e-02, grad_scale: 16.0 2024-10-08 07:50:56,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170664.0, ans=0.1 2024-10-08 07:50:57,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2024-10-08 07:51:19,565 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 07:51:26,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=170904.0, ans=0.125 2024-10-08 07:51:59,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=171024.0, ans=0.125 2024-10-08 07:52:07,590 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.743e+02 4.625e+02 5.229e+02 6.046e+02 1.260e+03, threshold=1.046e+03, percent-clipped=3.0 2024-10-08 07:52:14,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=171144.0, ans=0.2 2024-10-08 07:52:14,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=171144.0, ans=0.125 2024-10-08 07:52:24,346 INFO [train.py:1136] (0/2) Epoch 18, batch 400, loss[loss=0.211, simple_loss=0.3076, pruned_loss=0.05725, over 86497.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.3, pruned_loss=0.05476, over 14723974.36 frames. ], batch size: 620, lr: 1.38e-02, grad_scale: 32.0 2024-10-08 07:52:24,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=171264.0, ans=0.025 2024-10-08 07:52:55,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2024-10-08 07:53:57,249 INFO [train.py:1136] (0/2) Epoch 18, batch 450, loss[loss=0.2135, simple_loss=0.3102, pruned_loss=0.05842, over 85569.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2989, pruned_loss=0.05407, over 15266106.58 frames. ], batch size: 787, lr: 1.38e-02, grad_scale: 16.0 2024-10-08 07:54:02,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=171864.0, ans=0.125 2024-10-08 07:54:49,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=172104.0, ans=0.125 2024-10-08 07:54:52,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=172224.0, ans=0.0 2024-10-08 07:55:12,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=172344.0, ans=0.025 2024-10-08 07:55:13,927 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=172344.0, ans=0.125 2024-10-08 07:55:15,243 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.615e+02 4.524e+02 4.996e+02 5.703e+02 1.021e+03, threshold=9.993e+02, percent-clipped=0.0 2024-10-08 07:55:19,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=172344.0, ans=0.1 2024-10-08 07:55:32,520 INFO [train.py:1136] (0/2) Epoch 18, batch 500, loss[loss=0.1875, simple_loss=0.2803, pruned_loss=0.04734, over 86560.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2986, pruned_loss=0.05379, over 15684805.05 frames. ], batch size: 246, lr: 1.37e-02, grad_scale: 16.0 2024-10-08 07:55:48,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=172584.0, ans=0.125 2024-10-08 07:55:53,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172584.0, ans=0.1 2024-10-08 07:56:05,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=172584.0, ans=0.125 2024-10-08 07:56:14,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=172704.0, ans=0.2 2024-10-08 07:56:41,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=172824.0, ans=0.2 2024-10-08 07:56:45,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=172944.0, ans=0.125 2024-10-08 07:57:05,793 INFO [train.py:1136] (0/2) Epoch 18, batch 550, loss[loss=0.1965, simple_loss=0.29, pruned_loss=0.05149, over 87303.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2978, pruned_loss=0.05345, over 15994889.44 frames. ], batch size: 313, lr: 1.37e-02, grad_scale: 16.0 2024-10-08 07:57:15,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=173064.0, ans=0.0 2024-10-08 07:57:39,506 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=173184.0, ans=0.025 2024-10-08 07:58:00,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=173424.0, ans=0.0 2024-10-08 07:58:01,256 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-10-08 07:58:13,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=173424.0, ans=0.125 2024-10-08 07:58:18,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=173424.0, ans=0.125 2024-10-08 07:58:23,448 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.991e+02 4.537e+02 4.998e+02 5.715e+02 9.810e+02, threshold=9.996e+02, percent-clipped=0.0 2024-10-08 07:58:27,892 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.54 vs. limit=15.0 2024-10-08 07:58:41,666 INFO [train.py:1136] (0/2) Epoch 18, batch 600, loss[loss=0.186, simple_loss=0.2804, pruned_loss=0.04576, over 86512.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2971, pruned_loss=0.05304, over 16257942.06 frames. ], batch size: 246, lr: 1.37e-02, grad_scale: 16.0 2024-10-08 07:58:48,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=173664.0, ans=0.125 2024-10-08 07:59:28,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=173904.0, ans=0.0 2024-10-08 07:59:42,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2024-10-08 08:00:03,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=174144.0, ans=0.125 2024-10-08 08:00:08,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=174144.0, ans=0.07 2024-10-08 08:00:14,605 INFO [train.py:1136] (0/2) Epoch 18, batch 650, loss[loss=0.1948, simple_loss=0.2948, pruned_loss=0.04737, over 87192.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.297, pruned_loss=0.05318, over 16441946.81 frames. ], batch size: 372, lr: 1.37e-02, grad_scale: 16.0 2024-10-08 08:00:30,083 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2024-10-08 08:00:50,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174504.0, ans=0.1 2024-10-08 08:01:03,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=174624.0, ans=0.125 2024-10-08 08:01:03,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=174624.0, ans=0.0 2024-10-08 08:01:07,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-10-08 08:01:23,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=174744.0, ans=0.025 2024-10-08 08:01:24,219 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.732e+02 4.477e+02 4.947e+02 5.860e+02 9.583e+02, threshold=9.894e+02, percent-clipped=0.0 2024-10-08 08:01:29,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=174744.0, ans=0.125 2024-10-08 08:01:31,292 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-10-08 08:01:35,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=174744.0, ans=0.0 2024-10-08 08:01:38,654 INFO [train.py:1136] (0/2) Epoch 18, batch 700, loss[loss=0.1854, simple_loss=0.2795, pruned_loss=0.04563, over 86426.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.2972, pruned_loss=0.05312, over 16582764.20 frames. ], batch size: 229, lr: 1.37e-02, grad_scale: 16.0 2024-10-08 08:02:19,333 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=6.0 2024-10-08 08:02:44,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=175344.0, ans=0.0 2024-10-08 08:02:51,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=175344.0, ans=0.2 2024-10-08 08:03:01,056 INFO [train.py:1136] (0/2) Epoch 18, batch 750, loss[loss=0.2163, simple_loss=0.3111, pruned_loss=0.06078, over 85538.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2975, pruned_loss=0.05321, over 16679301.48 frames. ], batch size: 787, lr: 1.37e-02, grad_scale: 16.0 2024-10-08 08:03:26,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=175584.0, ans=0.125 2024-10-08 08:03:39,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.05 vs. limit=6.0 2024-10-08 08:04:10,260 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.828e+02 4.784e+02 5.587e+02 6.499e+02 9.719e+02, threshold=1.117e+03, percent-clipped=0.0 2024-10-08 08:04:16,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=175944.0, ans=0.2 2024-10-08 08:04:25,123 INFO [train.py:1136] (0/2) Epoch 18, batch 800, loss[loss=0.1851, simple_loss=0.2786, pruned_loss=0.04582, over 87182.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2981, pruned_loss=0.05381, over 16759565.05 frames. ], batch size: 264, lr: 1.36e-02, grad_scale: 32.0 2024-10-08 08:04:44,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=176184.0, ans=15.0 2024-10-08 08:04:50,885 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-18.pt 2024-10-08 08:05:43,908 INFO [train.py:1136] (0/2) Epoch 19, batch 0, loss[loss=0.2113, simple_loss=0.3092, pruned_loss=0.05664, over 85228.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.3092, pruned_loss=0.05664, over 85228.00 frames. ], batch size: 866, lr: 1.33e-02, grad_scale: 32.0 2024-10-08 08:05:43,910 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 08:05:49,654 INFO [zipformer.py:1883] (0/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.5380, 5.5733, 5.0040, 4.7026], device='cuda:0') 2024-10-08 08:05:54,934 INFO [train.py:1168] (0/2) Epoch 19, validation: loss=0.1707, simple_loss=0.2841, pruned_loss=0.02861, over 1382211.00 frames. 2024-10-08 08:05:54,934 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 08:06:02,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=176256.0, ans=0.0 2024-10-08 08:06:14,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=176376.0, ans=0.0 2024-10-08 08:06:28,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=176376.0, ans=0.0 2024-10-08 08:06:58,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=176616.0, ans=0.125 2024-10-08 08:07:04,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176616.0, ans=0.1 2024-10-08 08:07:24,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=176736.0, ans=0.025 2024-10-08 08:07:29,733 INFO [train.py:1136] (0/2) Epoch 19, batch 50, loss[loss=0.1916, simple_loss=0.2876, pruned_loss=0.04776, over 87422.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2991, pruned_loss=0.05427, over 3855650.82 frames. ], batch size: 313, lr: 1.32e-02, grad_scale: 32.0 2024-10-08 08:07:40,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=176856.0, ans=0.0 2024-10-08 08:07:40,489 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 08:07:41,126 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=22.5 2024-10-08 08:07:43,617 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=176856.0, ans=0.0 2024-10-08 08:07:43,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=176856.0, ans=0.0 2024-10-08 08:07:57,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=176976.0, ans=0.125 2024-10-08 08:08:17,587 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.692e+02 4.639e+02 5.150e+02 5.741e+02 8.682e+02, threshold=1.030e+03, percent-clipped=0.0 2024-10-08 08:08:22,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=177096.0, ans=0.2 2024-10-08 08:08:28,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=177216.0, ans=0.125 2024-10-08 08:08:29,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=177216.0, ans=0.125 2024-10-08 08:08:39,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=177216.0, ans=0.1 2024-10-08 08:09:01,993 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=177456.0, ans=0.0 2024-10-08 08:09:03,245 INFO [train.py:1136] (0/2) Epoch 19, batch 100, loss[loss=0.1869, simple_loss=0.2895, pruned_loss=0.04214, over 87229.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2962, pruned_loss=0.05205, over 6827447.34 frames. ], batch size: 464, lr: 1.32e-02, grad_scale: 32.0 2024-10-08 08:09:25,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=177576.0, ans=0.0 2024-10-08 08:09:41,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=177696.0, ans=0.125 2024-10-08 08:10:03,262 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 08:10:06,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=177816.0, ans=0.0 2024-10-08 08:10:15,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=177816.0, ans=0.125 2024-10-08 08:10:37,080 INFO [train.py:1136] (0/2) Epoch 19, batch 150, loss[loss=0.1972, simple_loss=0.2909, pruned_loss=0.05181, over 87193.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2963, pruned_loss=0.05205, over 9110538.79 frames. ], batch size: 280, lr: 1.32e-02, grad_scale: 32.0 2024-10-08 08:10:37,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178056.0, ans=0.1 2024-10-08 08:10:57,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=178176.0, ans=0.2 2024-10-08 08:11:25,013 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.755e+02 4.609e+02 5.239e+02 5.919e+02 7.582e+02, threshold=1.048e+03, percent-clipped=0.0 2024-10-08 08:11:43,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=178416.0, ans=0.125 2024-10-08 08:11:49,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=178416.0, ans=0.025 2024-10-08 08:12:11,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=178656.0, ans=0.125 2024-10-08 08:12:13,001 INFO [train.py:1136] (0/2) Epoch 19, batch 200, loss[loss=0.1834, simple_loss=0.2829, pruned_loss=0.04197, over 87397.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2964, pruned_loss=0.05212, over 10879988.46 frames. ], batch size: 439, lr: 1.32e-02, grad_scale: 32.0 2024-10-08 08:12:24,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=178656.0, ans=0.125 2024-10-08 08:13:09,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=179016.0, ans=0.125 2024-10-08 08:13:14,885 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=179016.0, ans=0.125 2024-10-08 08:13:16,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=179016.0, ans=0.125 2024-10-08 08:13:47,631 INFO [train.py:1136] (0/2) Epoch 19, batch 250, loss[loss=0.2103, simple_loss=0.3113, pruned_loss=0.05463, over 85453.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2963, pruned_loss=0.05177, over 12271261.17 frames. ], batch size: 787, lr: 1.32e-02, grad_scale: 32.0 2024-10-08 08:14:00,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=179256.0, ans=0.1 2024-10-08 08:14:20,009 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-10-08 08:14:35,062 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=22.5 2024-10-08 08:14:35,677 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.945e+02 4.559e+02 4.804e+02 5.465e+02 8.278e+02, threshold=9.608e+02, percent-clipped=0.0 2024-10-08 08:14:58,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=179616.0, ans=0.0 2024-10-08 08:14:58,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=179616.0, ans=0.0 2024-10-08 08:15:03,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=179736.0, ans=0.125 2024-10-08 08:15:21,432 INFO [train.py:1136] (0/2) Epoch 19, batch 300, loss[loss=0.1949, simple_loss=0.2834, pruned_loss=0.05316, over 85801.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2966, pruned_loss=0.0521, over 13335090.75 frames. ], batch size: 180, lr: 1.32e-02, grad_scale: 32.0 2024-10-08 08:15:37,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=179856.0, ans=0.125 2024-10-08 08:15:54,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=179976.0, ans=0.0 2024-10-08 08:16:09,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=180096.0, ans=0.0 2024-10-08 08:16:28,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180216.0, ans=0.1 2024-10-08 08:16:29,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=180216.0, ans=0.125 2024-10-08 08:16:31,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=180216.0, ans=0.0 2024-10-08 08:16:40,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=180336.0, ans=0.125 2024-10-08 08:16:53,265 INFO [train.py:1136] (0/2) Epoch 19, batch 350, loss[loss=0.205, simple_loss=0.3029, pruned_loss=0.05357, over 86303.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.2963, pruned_loss=0.05195, over 14195363.41 frames. ], batch size: 620, lr: 1.31e-02, grad_scale: 32.0 2024-10-08 08:17:33,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=180696.0, ans=0.0 2024-10-08 08:17:34,388 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2024-10-08 08:17:42,075 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.712e+02 4.390e+02 4.911e+02 5.448e+02 9.593e+02, threshold=9.822e+02, percent-clipped=0.0 2024-10-08 08:18:00,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=180816.0, ans=0.125 2024-10-08 08:18:00,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=180816.0, ans=0.1 2024-10-08 08:18:29,154 INFO [train.py:1136] (0/2) Epoch 19, batch 400, loss[loss=0.1906, simple_loss=0.2815, pruned_loss=0.0498, over 86700.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2965, pruned_loss=0.05213, over 14840680.44 frames. ], batch size: 213, lr: 1.31e-02, grad_scale: 32.0 2024-10-08 08:18:31,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=181056.0, ans=0.1 2024-10-08 08:18:44,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.55 vs. limit=15.0 2024-10-08 08:19:18,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.57 vs. limit=5.0 2024-10-08 08:19:19,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=181296.0, ans=0.125 2024-10-08 08:19:29,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181416.0, ans=0.1 2024-10-08 08:19:38,573 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=15.0 2024-10-08 08:19:56,090 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-10-08 08:19:57,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=181536.0, ans=0.0 2024-10-08 08:20:02,249 INFO [train.py:1136] (0/2) Epoch 19, batch 450, loss[loss=0.1849, simple_loss=0.2794, pruned_loss=0.04515, over 86891.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2955, pruned_loss=0.05178, over 15372814.05 frames. ], batch size: 246, lr: 1.31e-02, grad_scale: 32.0 2024-10-08 08:20:26,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=181776.0, ans=0.0 2024-10-08 08:20:28,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=181776.0, ans=0.0 2024-10-08 08:20:28,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=181776.0, ans=0.0 2024-10-08 08:20:29,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=181776.0, ans=0.0 2024-10-08 08:20:44,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181896.0, ans=0.0 2024-10-08 08:20:47,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=181896.0, ans=0.125 2024-10-08 08:20:50,632 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.629e+02 4.802e+02 5.424e+02 6.443e+02 9.840e+02, threshold=1.085e+03, percent-clipped=1.0 2024-10-08 08:20:53,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=181896.0, ans=0.0 2024-10-08 08:21:01,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=182016.0, ans=0.125 2024-10-08 08:21:24,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182136.0, ans=0.1 2024-10-08 08:21:30,393 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2024-10-08 08:21:38,814 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2024-10-08 08:21:39,426 INFO [train.py:1136] (0/2) Epoch 19, batch 500, loss[loss=0.2113, simple_loss=0.309, pruned_loss=0.05681, over 86420.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2965, pruned_loss=0.05247, over 15738564.61 frames. ], batch size: 668, lr: 1.31e-02, grad_scale: 16.0 2024-10-08 08:21:54,783 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2024-10-08 08:22:05,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2024-10-08 08:23:11,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=182736.0, ans=0.125 2024-10-08 08:23:13,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=182736.0, ans=0.09899494936611666 2024-10-08 08:23:14,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=182856.0, ans=0.0 2024-10-08 08:23:16,043 INFO [train.py:1136] (0/2) Epoch 19, batch 550, loss[loss=0.2125, simple_loss=0.3062, pruned_loss=0.05937, over 86496.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.2965, pruned_loss=0.05222, over 16033106.81 frames. ], batch size: 620, lr: 1.31e-02, grad_scale: 16.0 2024-10-08 08:24:01,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=183096.0, ans=0.0 2024-10-08 08:24:04,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.706e+02 4.292e+02 4.738e+02 5.810e+02 1.300e+03, threshold=9.476e+02, percent-clipped=2.0 2024-10-08 08:24:11,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=183216.0, ans=0.125 2024-10-08 08:24:37,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=183336.0, ans=0.0 2024-10-08 08:24:39,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.70 vs. limit=15.0 2024-10-08 08:24:42,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=183336.0, ans=0.0 2024-10-08 08:24:50,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183456.0, ans=0.1 2024-10-08 08:24:51,989 INFO [train.py:1136] (0/2) Epoch 19, batch 600, loss[loss=0.2038, simple_loss=0.2993, pruned_loss=0.05413, over 86928.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2969, pruned_loss=0.05229, over 16261814.60 frames. ], batch size: 548, lr: 1.30e-02, grad_scale: 16.0 2024-10-08 08:25:21,194 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=183576.0, ans=0.09899494936611666 2024-10-08 08:25:31,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=183696.0, ans=0.0 2024-10-08 08:25:32,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2024-10-08 08:25:33,118 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=183696.0, ans=0.125 2024-10-08 08:26:14,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=183936.0, ans=0.0 2024-10-08 08:26:27,219 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=184056.0, ans=0.0 2024-10-08 08:26:28,500 INFO [train.py:1136] (0/2) Epoch 19, batch 650, loss[loss=0.1925, simple_loss=0.2881, pruned_loss=0.04841, over 87119.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2976, pruned_loss=0.05261, over 16458381.35 frames. ], batch size: 330, lr: 1.30e-02, grad_scale: 16.0 2024-10-08 08:26:35,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=184056.0, ans=0.035 2024-10-08 08:26:55,585 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2024-10-08 08:27:13,976 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.653e+02 4.563e+02 5.232e+02 5.909e+02 9.275e+02, threshold=1.046e+03, percent-clipped=0.0 2024-10-08 08:27:42,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=184536.0, ans=0.0 2024-10-08 08:27:54,808 INFO [train.py:1136] (0/2) Epoch 19, batch 700, loss[loss=0.1836, simple_loss=0.2772, pruned_loss=0.04498, over 86377.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2973, pruned_loss=0.05281, over 16556978.00 frames. ], batch size: 229, lr: 1.30e-02, grad_scale: 16.0 2024-10-08 08:27:56,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=184656.0, ans=0.125 2024-10-08 08:28:16,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.50 vs. limit=15.0 2024-10-08 08:28:46,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=185016.0, ans=0.05 2024-10-08 08:28:47,245 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2024-10-08 08:29:07,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=185136.0, ans=0.07 2024-10-08 08:29:17,840 INFO [train.py:1136] (0/2) Epoch 19, batch 750, loss[loss=0.2222, simple_loss=0.3172, pruned_loss=0.06364, over 81890.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2971, pruned_loss=0.05272, over 16659679.40 frames. ], batch size: 1245, lr: 1.30e-02, grad_scale: 16.0 2024-10-08 08:29:39,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2024-10-08 08:29:50,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=185496.0, ans=0.125 2024-10-08 08:29:54,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2024-10-08 08:30:02,094 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.873e+02 4.694e+02 5.237e+02 6.068e+02 1.563e+03, threshold=1.047e+03, percent-clipped=2.0 2024-10-08 08:30:04,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=22.5 2024-10-08 08:30:17,374 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=185616.0, ans=0.125 2024-10-08 08:30:20,576 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 08:30:40,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=185856.0, ans=0.1 2024-10-08 08:30:40,608 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=12.0 2024-10-08 08:30:41,387 INFO [train.py:1136] (0/2) Epoch 19, batch 800, loss[loss=0.214, simple_loss=0.3119, pruned_loss=0.05803, over 86017.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.2972, pruned_loss=0.05305, over 16701858.47 frames. ], batch size: 721, lr: 1.30e-02, grad_scale: 32.0 2024-10-08 08:30:50,886 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=12.0 2024-10-08 08:31:07,392 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-19.pt 2024-10-08 08:32:01,829 INFO [train.py:1136] (0/2) Epoch 20, batch 0, loss[loss=0.1969, simple_loss=0.2908, pruned_loss=0.05155, over 87332.00 frames. ], tot_loss[loss=0.1969, simple_loss=0.2908, pruned_loss=0.05155, over 87332.00 frames. ], batch size: 313, lr: 1.26e-02, grad_scale: 32.0 2024-10-08 08:32:01,830 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 08:32:06,807 INFO [zipformer.py:1883] (0/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1122, 4.9836, 4.7558, 4.5319], device='cuda:0') 2024-10-08 08:32:12,796 INFO [train.py:1168] (0/2) Epoch 20, validation: loss=0.1695, simple_loss=0.2834, pruned_loss=0.02773, over 1382211.00 frames. 2024-10-08 08:32:12,797 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 08:32:35,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=186168.0, ans=0.125 2024-10-08 08:32:39,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=186168.0, ans=0.0 2024-10-08 08:33:31,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=186528.0, ans=0.125 2024-10-08 08:33:35,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=186528.0, ans=0.125 2024-10-08 08:33:47,234 INFO [train.py:1136] (0/2) Epoch 20, batch 50, loss[loss=0.2036, simple_loss=0.3007, pruned_loss=0.05319, over 86977.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.297, pruned_loss=0.05258, over 3846650.47 frames. ], batch size: 583, lr: 1.26e-02, grad_scale: 32.0 2024-10-08 08:33:58,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=186648.0, ans=0.125 2024-10-08 08:34:07,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=186768.0, ans=0.125 2024-10-08 08:34:08,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.664e+02 4.600e+02 5.332e+02 6.130e+02 8.445e+02, threshold=1.066e+03, percent-clipped=0.0 2024-10-08 08:34:28,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2024-10-08 08:34:38,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=186888.0, ans=0.0 2024-10-08 08:35:01,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=187128.0, ans=0.125 2024-10-08 08:35:23,502 INFO [train.py:1136] (0/2) Epoch 20, batch 100, loss[loss=0.2106, simple_loss=0.3093, pruned_loss=0.05598, over 85332.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.2937, pruned_loss=0.05029, over 6822987.92 frames. ], batch size: 866, lr: 1.26e-02, grad_scale: 32.0 2024-10-08 08:35:25,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=187248.0, ans=0.125 2024-10-08 08:35:44,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187368.0, ans=0.1 2024-10-08 08:35:56,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=187368.0, ans=0.125 2024-10-08 08:36:04,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=187488.0, ans=0.025 2024-10-08 08:36:06,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=15.0 2024-10-08 08:36:09,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=187488.0, ans=0.2 2024-10-08 08:36:17,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=187488.0, ans=0.125 2024-10-08 08:36:23,342 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-10-08 08:36:25,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=187608.0, ans=0.125 2024-10-08 08:36:56,614 INFO [train.py:1136] (0/2) Epoch 20, batch 150, loss[loss=0.1858, simple_loss=0.2796, pruned_loss=0.04603, over 87403.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2944, pruned_loss=0.0509, over 9092275.42 frames. ], batch size: 296, lr: 1.26e-02, grad_scale: 32.0 2024-10-08 08:36:58,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187848.0, ans=0.1 2024-10-08 08:37:05,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=187848.0, ans=0.0 2024-10-08 08:37:18,291 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.696e+02 4.260e+02 4.599e+02 5.216e+02 8.952e+02, threshold=9.198e+02, percent-clipped=0.0 2024-10-08 08:38:33,688 INFO [train.py:1136] (0/2) Epoch 20, batch 200, loss[loss=0.2086, simple_loss=0.3072, pruned_loss=0.05501, over 85302.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2947, pruned_loss=0.05096, over 10852188.20 frames. ], batch size: 866, lr: 1.26e-02, grad_scale: 16.0 2024-10-08 08:39:11,378 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=188688.0, ans=0.04949747468305833 2024-10-08 08:39:27,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=188688.0, ans=0.0 2024-10-08 08:39:30,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188808.0, ans=0.1 2024-10-08 08:39:44,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=188808.0, ans=0.2 2024-10-08 08:39:47,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=188808.0, ans=0.0 2024-10-08 08:39:53,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188928.0, ans=0.1 2024-10-08 08:40:05,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.98 vs. limit=10.0 2024-10-08 08:40:07,641 INFO [train.py:1136] (0/2) Epoch 20, batch 250, loss[loss=0.1805, simple_loss=0.2765, pruned_loss=0.0423, over 86815.00 frames. ], tot_loss[loss=0.1977, simple_loss=0.2942, pruned_loss=0.05063, over 12233398.27 frames. ], batch size: 229, lr: 1.26e-02, grad_scale: 16.0 2024-10-08 08:40:21,522 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=22.5 2024-10-08 08:40:31,058 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.627e+02 4.671e+02 5.427e+02 6.324e+02 8.354e+02, threshold=1.085e+03, percent-clipped=0.0 2024-10-08 08:40:32,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.05 vs. limit=15.0 2024-10-08 08:40:53,262 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=189288.0, ans=0.125 2024-10-08 08:41:44,973 INFO [train.py:1136] (0/2) Epoch 20, batch 300, loss[loss=0.1803, simple_loss=0.2818, pruned_loss=0.03936, over 87397.00 frames. ], tot_loss[loss=0.198, simple_loss=0.2947, pruned_loss=0.05069, over 13333271.55 frames. ], batch size: 415, lr: 1.25e-02, grad_scale: 16.0 2024-10-08 08:42:10,301 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=189768.0, ans=0.2 2024-10-08 08:43:17,394 INFO [train.py:1136] (0/2) Epoch 20, batch 350, loss[loss=0.1872, simple_loss=0.291, pruned_loss=0.04168, over 87440.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2945, pruned_loss=0.0505, over 14196160.97 frames. ], batch size: 490, lr: 1.25e-02, grad_scale: 16.0 2024-10-08 08:43:17,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=190248.0, ans=0.0 2024-10-08 08:43:23,679 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=190248.0, ans=0.0 2024-10-08 08:43:40,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.708e+02 4.189e+02 4.588e+02 5.446e+02 8.634e+02, threshold=9.175e+02, percent-clipped=0.0 2024-10-08 08:43:52,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=190368.0, ans=0.125 2024-10-08 08:44:00,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=190488.0, ans=0.2 2024-10-08 08:44:07,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=190488.0, ans=0.125 2024-10-08 08:44:44,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-10-08 08:44:53,895 INFO [train.py:1136] (0/2) Epoch 20, batch 400, loss[loss=0.1906, simple_loss=0.2796, pruned_loss=0.05079, over 85885.00 frames. ], tot_loss[loss=0.198, simple_loss=0.2946, pruned_loss=0.05073, over 14830982.86 frames. ], batch size: 180, lr: 1.25e-02, grad_scale: 16.0 2024-10-08 08:44:58,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-10-08 08:45:17,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=190968.0, ans=0.0 2024-10-08 08:45:31,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=191088.0, ans=0.2 2024-10-08 08:46:22,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=191328.0, ans=0.125 2024-10-08 08:46:29,405 INFO [train.py:1136] (0/2) Epoch 20, batch 450, loss[loss=0.2045, simple_loss=0.2997, pruned_loss=0.05463, over 87047.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.295, pruned_loss=0.05105, over 15339467.57 frames. ], batch size: 548, lr: 1.25e-02, grad_scale: 16.0 2024-10-08 08:46:29,863 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=191448.0, ans=0.1 2024-10-08 08:46:30,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2024-10-08 08:46:44,376 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2024-10-08 08:46:54,351 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.765e+02 4.502e+02 5.260e+02 6.134e+02 1.687e+03, threshold=1.052e+03, percent-clipped=2.0 2024-10-08 08:46:54,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=191568.0, ans=0.0 2024-10-08 08:47:27,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=191808.0, ans=0.125 2024-10-08 08:47:39,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=191808.0, ans=0.0 2024-10-08 08:47:51,120 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-16000.pt 2024-10-08 08:48:52,633 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 08:48:57,230 INFO [train.py:1136] (0/2) Epoch 20, batch 500, loss[loss=0.2397, simple_loss=0.3234, pruned_loss=0.07802, over 69399.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2941, pruned_loss=0.05052, over 15758368.12 frames. ], batch size: 1960, lr: 1.25e-02, grad_scale: 16.0 2024-10-08 08:50:30,533 INFO [train.py:1136] (0/2) Epoch 20, batch 550, loss[loss=0.2106, simple_loss=0.3095, pruned_loss=0.05587, over 85578.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2942, pruned_loss=0.05051, over 16064759.22 frames. ], batch size: 787, lr: 1.25e-02, grad_scale: 16.0 2024-10-08 08:50:40,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=192648.0, ans=15.0 2024-10-08 08:50:42,106 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=15.0 2024-10-08 08:50:55,068 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.469e+02 4.214e+02 4.816e+02 5.845e+02 1.176e+03, threshold=9.632e+02, percent-clipped=1.0 2024-10-08 08:51:20,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=192888.0, ans=0.0 2024-10-08 08:51:26,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=193008.0, ans=0.125 2024-10-08 08:51:27,415 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=15.0 2024-10-08 08:51:28,743 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-10-08 08:51:30,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=193008.0, ans=0.025 2024-10-08 08:51:34,858 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-10-08 08:52:05,134 INFO [train.py:1136] (0/2) Epoch 20, batch 600, loss[loss=0.1822, simple_loss=0.2763, pruned_loss=0.04407, over 86084.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2945, pruned_loss=0.05078, over 16315395.23 frames. ], batch size: 197, lr: 1.24e-02, grad_scale: 16.0 2024-10-08 08:52:08,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=193248.0, ans=0.125 2024-10-08 08:52:12,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2024-10-08 08:52:13,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=193248.0, ans=0.0 2024-10-08 08:52:37,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=193368.0, ans=0.125 2024-10-08 08:52:37,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=193368.0, ans=0.125 2024-10-08 08:52:55,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=193488.0, ans=0.0 2024-10-08 08:53:10,237 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=193608.0, ans=0.125 2024-10-08 08:53:11,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=193608.0, ans=0.0 2024-10-08 08:53:38,067 INFO [train.py:1136] (0/2) Epoch 20, batch 650, loss[loss=0.1852, simple_loss=0.2865, pruned_loss=0.04193, over 87310.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2941, pruned_loss=0.05072, over 16464019.90 frames. ], batch size: 439, lr: 1.24e-02, grad_scale: 16.0 2024-10-08 08:54:06,068 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.527e+02 4.200e+02 4.638e+02 5.396e+02 7.629e+02, threshold=9.276e+02, percent-clipped=0.0 2024-10-08 08:55:09,087 INFO [train.py:1136] (0/2) Epoch 20, batch 700, loss[loss=0.191, simple_loss=0.2937, pruned_loss=0.04414, over 87403.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2944, pruned_loss=0.05067, over 16596303.73 frames. ], batch size: 490, lr: 1.24e-02, grad_scale: 16.0 2024-10-08 08:55:25,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=194568.0, ans=0.0 2024-10-08 08:55:25,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=194568.0, ans=0.125 2024-10-08 08:55:56,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=194688.0, ans=0.1 2024-10-08 08:56:32,807 INFO [train.py:1136] (0/2) Epoch 20, batch 750, loss[loss=0.2519, simple_loss=0.3376, pruned_loss=0.0831, over 78819.00 frames. ], tot_loss[loss=0.1974, simple_loss=0.294, pruned_loss=0.05045, over 16706012.62 frames. ], batch size: 1493, lr: 1.24e-02, grad_scale: 16.0 2024-10-08 08:56:40,947 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=195048.0, ans=0.0 2024-10-08 08:56:54,969 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.660e+02 4.469e+02 5.174e+02 6.251e+02 9.708e+02, threshold=1.035e+03, percent-clipped=1.0 2024-10-08 08:57:31,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=195408.0, ans=0.0 2024-10-08 08:57:50,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=195528.0, ans=0.125 2024-10-08 08:57:52,482 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 08:57:52,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=195528.0, ans=0.2 2024-10-08 08:57:58,483 INFO [train.py:1136] (0/2) Epoch 20, batch 800, loss[loss=0.1862, simple_loss=0.2861, pruned_loss=0.04311, over 87316.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2954, pruned_loss=0.05157, over 16683181.27 frames. ], batch size: 415, lr: 1.24e-02, grad_scale: 32.0 2024-10-08 08:58:05,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=195648.0, ans=0.125 2024-10-08 08:58:13,661 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2024-10-08 08:58:24,515 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-20.pt 2024-10-08 09:00:12,259 INFO [train.py:1136] (0/2) Epoch 21, batch 0, loss[loss=0.1875, simple_loss=0.2824, pruned_loss=0.04634, over 86334.00 frames. ], tot_loss[loss=0.1875, simple_loss=0.2824, pruned_loss=0.04634, over 86334.00 frames. ], batch size: 197, lr: 1.21e-02, grad_scale: 32.0 2024-10-08 09:00:12,260 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 09:00:24,088 INFO [train.py:1168] (0/2) Epoch 21, validation: loss=0.1698, simple_loss=0.2834, pruned_loss=0.0281, over 1382211.00 frames. 2024-10-08 09:00:24,089 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 09:01:19,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=196200.0, ans=0.125 2024-10-08 09:01:54,844 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.589e+02 4.884e+02 5.566e+02 6.302e+02 9.094e+02, threshold=1.113e+03, percent-clipped=0.0 2024-10-08 09:02:00,878 INFO [train.py:1136] (0/2) Epoch 21, batch 50, loss[loss=0.2068, simple_loss=0.3064, pruned_loss=0.05357, over 84567.00 frames. ], tot_loss[loss=0.1961, simple_loss=0.2919, pruned_loss=0.05012, over 3854613.55 frames. ], batch size: 958, lr: 1.21e-02, grad_scale: 16.0 2024-10-08 09:02:03,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=196440.0, ans=0.0 2024-10-08 09:02:11,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=196440.0, ans=0.125 2024-10-08 09:02:36,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196680.0, ans=0.1 2024-10-08 09:02:42,234 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2024-10-08 09:02:42,392 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.81 vs. limit=10.0 2024-10-08 09:02:58,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=196800.0, ans=0.0 2024-10-08 09:03:15,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=196920.0, ans=0.125 2024-10-08 09:03:26,247 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.11 vs. limit=15.0 2024-10-08 09:03:31,994 INFO [train.py:1136] (0/2) Epoch 21, batch 100, loss[loss=0.1814, simple_loss=0.281, pruned_loss=0.04088, over 87406.00 frames. ], tot_loss[loss=0.1947, simple_loss=0.2911, pruned_loss=0.04919, over 6818269.76 frames. ], batch size: 415, lr: 1.20e-02, grad_scale: 16.0 2024-10-08 09:04:20,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=197280.0, ans=0.125 2024-10-08 09:04:37,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=197400.0, ans=0.0 2024-10-08 09:04:57,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=197520.0, ans=0.025 2024-10-08 09:04:59,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.63 vs. limit=8.0 2024-10-08 09:05:02,617 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:05:03,671 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.709e+02 4.441e+02 5.021e+02 5.803e+02 8.582e+02, threshold=1.004e+03, percent-clipped=0.0 2024-10-08 09:05:06,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197640.0, ans=0.1 2024-10-08 09:05:07,270 INFO [train.py:1136] (0/2) Epoch 21, batch 150, loss[loss=0.2081, simple_loss=0.3022, pruned_loss=0.05699, over 87080.00 frames. ], tot_loss[loss=0.1942, simple_loss=0.2908, pruned_loss=0.0488, over 9100132.60 frames. ], batch size: 548, lr: 1.20e-02, grad_scale: 16.0 2024-10-08 09:05:40,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=197760.0, ans=0.125 2024-10-08 09:05:40,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=197760.0, ans=0.125 2024-10-08 09:05:42,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.57 vs. limit=10.0 2024-10-08 09:06:07,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198000.0, ans=0.1 2024-10-08 09:06:15,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=198000.0, ans=0.05 2024-10-08 09:06:40,078 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=198120.0, ans=0.0 2024-10-08 09:06:40,201 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:06:43,213 INFO [train.py:1136] (0/2) Epoch 21, batch 200, loss[loss=0.1883, simple_loss=0.2878, pruned_loss=0.04437, over 87229.00 frames. ], tot_loss[loss=0.1955, simple_loss=0.2921, pruned_loss=0.04948, over 10840300.61 frames. ], batch size: 415, lr: 1.20e-02, grad_scale: 8.0 2024-10-08 09:07:11,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=198360.0, ans=0.025 2024-10-08 09:07:24,940 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=198480.0, ans=0.0 2024-10-08 09:07:58,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=198720.0, ans=0.0 2024-10-08 09:08:00,314 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:08:16,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=198720.0, ans=0.125 2024-10-08 09:08:17,900 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.863e+02 4.622e+02 5.085e+02 6.135e+02 9.095e+02, threshold=1.017e+03, percent-clipped=0.0 2024-10-08 09:08:19,664 INFO [train.py:1136] (0/2) Epoch 21, batch 250, loss[loss=0.1777, simple_loss=0.2715, pruned_loss=0.04194, over 86606.00 frames. ], tot_loss[loss=0.1953, simple_loss=0.2918, pruned_loss=0.04947, over 12220802.13 frames. ], batch size: 246, lr: 1.20e-02, grad_scale: 8.0 2024-10-08 09:08:20,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=198840.0, ans=0.125 2024-10-08 09:08:27,497 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2024-10-08 09:08:28,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=198840.0, ans=0.0 2024-10-08 09:09:12,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=199080.0, ans=0.125 2024-10-08 09:09:14,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-10-08 09:09:49,359 INFO [train.py:1136] (0/2) Epoch 21, batch 300, loss[loss=0.1741, simple_loss=0.2711, pruned_loss=0.03857, over 86630.00 frames. ], tot_loss[loss=0.1952, simple_loss=0.2915, pruned_loss=0.04945, over 13319051.59 frames. ], batch size: 246, lr: 1.20e-02, grad_scale: 8.0 2024-10-08 09:10:25,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=199560.0, ans=0.09899494936611666 2024-10-08 09:10:30,434 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=199680.0, ans=0.125 2024-10-08 09:10:37,413 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199680.0, ans=0.1 2024-10-08 09:10:39,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=199680.0, ans=15.0 2024-10-08 09:11:04,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=199800.0, ans=0.125 2024-10-08 09:11:07,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=199920.0, ans=0.2 2024-10-08 09:11:09,284 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:11:16,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=199920.0, ans=0.2 2024-10-08 09:11:18,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199920.0, ans=0.1 2024-10-08 09:11:22,851 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.638e+02 4.246e+02 4.741e+02 5.459e+02 8.220e+02, threshold=9.481e+02, percent-clipped=0.0 2024-10-08 09:11:24,637 INFO [train.py:1136] (0/2) Epoch 21, batch 350, loss[loss=0.2242, simple_loss=0.3208, pruned_loss=0.0638, over 81803.00 frames. ], tot_loss[loss=0.1951, simple_loss=0.2915, pruned_loss=0.0494, over 14142379.15 frames. ], batch size: 1245, lr: 1.20e-02, grad_scale: 8.0 2024-10-08 09:12:18,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=200280.0, ans=0.125 2024-10-08 09:12:28,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=200400.0, ans=0.2 2024-10-08 09:12:32,618 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2024-10-08 09:12:48,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=200520.0, ans=0.1 2024-10-08 09:12:57,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=200640.0, ans=0.0 2024-10-08 09:12:58,241 INFO [train.py:1136] (0/2) Epoch 21, batch 400, loss[loss=0.1908, simple_loss=0.291, pruned_loss=0.04531, over 87387.00 frames. ], tot_loss[loss=0.1952, simple_loss=0.2919, pruned_loss=0.04923, over 14821486.00 frames. ], batch size: 464, lr: 1.20e-02, grad_scale: 16.0 2024-10-08 09:13:01,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=200640.0, ans=0.07 2024-10-08 09:14:13,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=201000.0, ans=0.125 2024-10-08 09:14:17,729 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=6.0 2024-10-08 09:14:18,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201120.0, ans=0.1 2024-10-08 09:14:33,527 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.869e+02 4.335e+02 4.976e+02 5.697e+02 1.060e+03, threshold=9.953e+02, percent-clipped=1.0 2024-10-08 09:14:35,411 INFO [train.py:1136] (0/2) Epoch 21, batch 450, loss[loss=0.1837, simple_loss=0.2765, pruned_loss=0.04542, over 86662.00 frames. ], tot_loss[loss=0.1958, simple_loss=0.2923, pruned_loss=0.04962, over 15317511.39 frames. ], batch size: 229, lr: 1.19e-02, grad_scale: 16.0 2024-10-08 09:14:37,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=201240.0, ans=0.0 2024-10-08 09:14:47,858 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=201240.0, ans=0.125 2024-10-08 09:14:49,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=201240.0, ans=0.125 2024-10-08 09:14:49,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=201240.0, ans=0.125 2024-10-08 09:15:26,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=201480.0, ans=0.2 2024-10-08 09:15:34,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=201600.0, ans=0.0 2024-10-08 09:15:34,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=201600.0, ans=0.125 2024-10-08 09:15:48,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=201600.0, ans=0.2 2024-10-08 09:16:14,668 INFO [train.py:1136] (0/2) Epoch 21, batch 500, loss[loss=0.1805, simple_loss=0.2847, pruned_loss=0.0382, over 87377.00 frames. ], tot_loss[loss=0.1958, simple_loss=0.2923, pruned_loss=0.04965, over 15715031.99 frames. ], batch size: 490, lr: 1.19e-02, grad_scale: 16.0 2024-10-08 09:16:15,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=201840.0, ans=0.125 2024-10-08 09:16:24,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=201840.0, ans=0.125 2024-10-08 09:16:29,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=201840.0, ans=0.125 2024-10-08 09:16:45,332 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:16:51,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=202080.0, ans=0.2 2024-10-08 09:17:12,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-10-08 09:17:45,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=202320.0, ans=0.125 2024-10-08 09:17:46,720 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.542e+02 4.465e+02 5.087e+02 5.707e+02 8.376e+02, threshold=1.017e+03, percent-clipped=0.0 2024-10-08 09:17:48,455 INFO [train.py:1136] (0/2) Epoch 21, batch 550, loss[loss=0.2133, simple_loss=0.3112, pruned_loss=0.05775, over 85190.00 frames. ], tot_loss[loss=0.1968, simple_loss=0.2933, pruned_loss=0.05011, over 16017991.77 frames. ], batch size: 866, lr: 1.19e-02, grad_scale: 16.0 2024-10-08 09:17:53,614 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.21 vs. limit=15.0 2024-10-08 09:17:53,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=22.5 2024-10-08 09:17:58,570 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202440.0, ans=0.1 2024-10-08 09:18:05,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=202440.0, ans=0.125 2024-10-08 09:18:43,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=202800.0, ans=0.125 2024-10-08 09:19:20,049 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.84 vs. limit=22.5 2024-10-08 09:19:22,436 INFO [train.py:1136] (0/2) Epoch 21, batch 600, loss[loss=0.1827, simple_loss=0.2867, pruned_loss=0.03934, over 87183.00 frames. ], tot_loss[loss=0.1963, simple_loss=0.2927, pruned_loss=0.04995, over 16236956.87 frames. ], batch size: 517, lr: 1.19e-02, grad_scale: 16.0 2024-10-08 09:19:23,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=203040.0, ans=0.0 2024-10-08 09:19:25,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=203040.0, ans=0.125 2024-10-08 09:19:40,936 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2024-10-08 09:19:48,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=203160.0, ans=0.1 2024-10-08 09:20:02,653 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:20:29,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=203400.0, ans=0.0 2024-10-08 09:20:52,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=203520.0, ans=0.2 2024-10-08 09:20:57,176 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.649e+02 4.336e+02 4.827e+02 5.529e+02 8.643e+02, threshold=9.655e+02, percent-clipped=0.0 2024-10-08 09:20:57,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203640.0, ans=0.1 2024-10-08 09:20:58,982 INFO [train.py:1136] (0/2) Epoch 21, batch 650, loss[loss=0.1883, simple_loss=0.2831, pruned_loss=0.04669, over 87235.00 frames. ], tot_loss[loss=0.1968, simple_loss=0.2935, pruned_loss=0.05006, over 16395596.19 frames. ], batch size: 280, lr: 1.19e-02, grad_scale: 16.0 2024-10-08 09:21:41,304 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=22.5 2024-10-08 09:21:55,926 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=204000.0, ans=0.025 2024-10-08 09:22:12,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204120.0, ans=0.1 2024-10-08 09:22:17,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=204120.0, ans=0.0 2024-10-08 09:22:26,399 INFO [train.py:1136] (0/2) Epoch 21, batch 700, loss[loss=0.1826, simple_loss=0.2819, pruned_loss=0.04161, over 87477.00 frames. ], tot_loss[loss=0.1966, simple_loss=0.2933, pruned_loss=0.04988, over 16554652.00 frames. ], batch size: 393, lr: 1.19e-02, grad_scale: 16.0 2024-10-08 09:22:40,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=204240.0, ans=0.025 2024-10-08 09:22:47,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=22.5 2024-10-08 09:23:16,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=204600.0, ans=0.07 2024-10-08 09:23:26,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=204600.0, ans=0.07 2024-10-08 09:23:49,538 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.579e+02 4.196e+02 4.721e+02 5.535e+02 8.502e+02, threshold=9.442e+02, percent-clipped=0.0 2024-10-08 09:23:52,417 INFO [train.py:1136] (0/2) Epoch 21, batch 750, loss[loss=0.1861, simple_loss=0.2894, pruned_loss=0.04144, over 87423.00 frames. ], tot_loss[loss=0.1962, simple_loss=0.2934, pruned_loss=0.04954, over 16699507.15 frames. ], batch size: 464, lr: 1.18e-02, grad_scale: 16.0 2024-10-08 09:24:54,185 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=205200.0, ans=0.025 2024-10-08 09:25:02,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2024-10-08 09:25:13,803 INFO [train.py:1136] (0/2) Epoch 21, batch 800, loss[loss=0.1901, simple_loss=0.2922, pruned_loss=0.04406, over 87328.00 frames. ], tot_loss[loss=0.1962, simple_loss=0.2938, pruned_loss=0.04929, over 16781160.23 frames. ], batch size: 464, lr: 1.18e-02, grad_scale: 32.0 2024-10-08 09:25:26,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=22.5 2024-10-08 09:25:33,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=205560.0, ans=0.125 2024-10-08 09:25:39,174 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-21.pt 2024-10-08 09:26:34,682 INFO [train.py:1136] (0/2) Epoch 22, batch 0, loss[loss=0.2128, simple_loss=0.314, pruned_loss=0.05579, over 85383.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.314, pruned_loss=0.05579, over 85383.00 frames. ], batch size: 866, lr: 1.16e-02, grad_scale: 32.0 2024-10-08 09:26:34,683 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 09:26:47,675 INFO [train.py:1168] (0/2) Epoch 22, validation: loss=0.1689, simple_loss=0.2819, pruned_loss=0.02796, over 1382211.00 frames. 2024-10-08 09:26:47,675 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 09:26:48,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=205632.0, ans=0.2 2024-10-08 09:26:56,571 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:27:47,929 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:27:48,963 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.738e+02 4.669e+02 5.209e+02 6.181e+02 8.513e+02, threshold=1.042e+03, percent-clipped=0.0 2024-10-08 09:27:58,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=205992.0, ans=0.0 2024-10-08 09:28:07,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=206112.0, ans=0.125 2024-10-08 09:28:17,673 INFO [train.py:1136] (0/2) Epoch 22, batch 50, loss[loss=0.1845, simple_loss=0.2866, pruned_loss=0.04124, over 87354.00 frames. ], tot_loss[loss=0.1933, simple_loss=0.2902, pruned_loss=0.0482, over 3869078.58 frames. ], batch size: 490, lr: 1.15e-02, grad_scale: 32.0 2024-10-08 09:28:24,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=206232.0, ans=0.125 2024-10-08 09:28:57,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.41 vs. limit=10.0 2024-10-08 09:29:04,114 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=206472.0, ans=0.125 2024-10-08 09:29:53,885 INFO [train.py:1136] (0/2) Epoch 22, batch 100, loss[loss=0.2079, simple_loss=0.3043, pruned_loss=0.05571, over 86366.00 frames. ], tot_loss[loss=0.192, simple_loss=0.2896, pruned_loss=0.04713, over 6847960.17 frames. ], batch size: 620, lr: 1.15e-02, grad_scale: 32.0 2024-10-08 09:30:07,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=206832.0, ans=10.0 2024-10-08 09:30:09,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=206952.0, ans=0.0 2024-10-08 09:30:15,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=206952.0, ans=0.125 2024-10-08 09:30:38,259 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:30:40,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=207072.0, ans=0.125 2024-10-08 09:30:58,653 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.732e+02 4.239e+02 4.765e+02 5.341e+02 1.091e+03, threshold=9.530e+02, percent-clipped=1.0 2024-10-08 09:31:02,345 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=207192.0, ans=0.125 2024-10-08 09:31:02,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=207192.0, ans=0.1 2024-10-08 09:31:15,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=207312.0, ans=0.2 2024-10-08 09:31:29,706 INFO [train.py:1136] (0/2) Epoch 22, batch 150, loss[loss=0.2367, simple_loss=0.322, pruned_loss=0.07568, over 69560.00 frames. ], tot_loss[loss=0.1938, simple_loss=0.2912, pruned_loss=0.04819, over 9088782.13 frames. ], batch size: 1960, lr: 1.15e-02, grad_scale: 8.0 2024-10-08 09:31:30,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=22.5 2024-10-08 09:32:09,004 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:32:33,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=207792.0, ans=0.125 2024-10-08 09:33:02,966 INFO [train.py:1136] (0/2) Epoch 22, batch 200, loss[loss=0.1825, simple_loss=0.2791, pruned_loss=0.04293, over 87230.00 frames. ], tot_loss[loss=0.1931, simple_loss=0.2902, pruned_loss=0.04803, over 10898327.16 frames. ], batch size: 350, lr: 1.15e-02, grad_scale: 8.0 2024-10-08 09:33:26,834 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=22.5 2024-10-08 09:33:37,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=208152.0, ans=0.0 2024-10-08 09:33:44,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=208272.0, ans=0.125 2024-10-08 09:34:10,530 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.662e+02 4.434e+02 4.863e+02 5.592e+02 8.133e+02, threshold=9.727e+02, percent-clipped=0.0 2024-10-08 09:34:38,776 INFO [train.py:1136] (0/2) Epoch 22, batch 250, loss[loss=0.2013, simple_loss=0.2989, pruned_loss=0.05184, over 85930.00 frames. ], tot_loss[loss=0.1939, simple_loss=0.2913, pruned_loss=0.04826, over 12280034.02 frames. ], batch size: 721, lr: 1.15e-02, grad_scale: 8.0 2024-10-08 09:35:18,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=208872.0, ans=0.05 2024-10-08 09:35:31,929 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-10-08 09:35:42,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=12.0 2024-10-08 09:36:13,568 INFO [train.py:1136] (0/2) Epoch 22, batch 300, loss[loss=0.2016, simple_loss=0.2969, pruned_loss=0.05317, over 86809.00 frames. ], tot_loss[loss=0.1947, simple_loss=0.292, pruned_loss=0.04869, over 13303583.54 frames. ], batch size: 547, lr: 1.15e-02, grad_scale: 8.0 2024-10-08 09:36:29,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=209232.0, ans=0.125 2024-10-08 09:36:35,285 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=209352.0, ans=0.07 2024-10-08 09:36:44,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=209352.0, ans=0.125 2024-10-08 09:37:21,177 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.728e+02 4.303e+02 4.755e+02 5.365e+02 1.949e+03, threshold=9.509e+02, percent-clipped=1.0 2024-10-08 09:37:30,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=209712.0, ans=0.07 2024-10-08 09:37:49,743 INFO [train.py:1136] (0/2) Epoch 22, batch 350, loss[loss=0.1743, simple_loss=0.2688, pruned_loss=0.03991, over 86893.00 frames. ], tot_loss[loss=0.1947, simple_loss=0.292, pruned_loss=0.04876, over 14138026.38 frames. ], batch size: 229, lr: 1.15e-02, grad_scale: 8.0 2024-10-08 09:38:03,029 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:38:11,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=209952.0, ans=0.035 2024-10-08 09:38:31,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=210072.0, ans=0.0 2024-10-08 09:39:01,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=210192.0, ans=0.0 2024-10-08 09:39:23,030 INFO [train.py:1136] (0/2) Epoch 22, batch 400, loss[loss=0.1815, simple_loss=0.2742, pruned_loss=0.04441, over 86697.00 frames. ], tot_loss[loss=0.1937, simple_loss=0.2909, pruned_loss=0.04825, over 14822097.56 frames. ], batch size: 213, lr: 1.14e-02, grad_scale: 16.0 2024-10-08 09:39:32,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=210432.0, ans=0.125 2024-10-08 09:39:38,549 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2024-10-08 09:39:42,215 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=210432.0, ans=0.025 2024-10-08 09:39:42,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.38 vs. limit=12.0 2024-10-08 09:39:53,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=210552.0, ans=0.0 2024-10-08 09:40:13,737 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-10-08 09:40:27,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210792.0, ans=0.1 2024-10-08 09:40:31,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=210792.0, ans=0.1 2024-10-08 09:40:32,712 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.727e+02 4.285e+02 4.646e+02 5.274e+02 9.072e+02, threshold=9.291e+02, percent-clipped=0.0 2024-10-08 09:40:59,687 INFO [train.py:1136] (0/2) Epoch 22, batch 450, loss[loss=0.208, simple_loss=0.3082, pruned_loss=0.05388, over 83384.00 frames. ], tot_loss[loss=0.1937, simple_loss=0.2908, pruned_loss=0.04828, over 15347596.18 frames. ], batch size: 1078, lr: 1.14e-02, grad_scale: 16.0 2024-10-08 09:41:05,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=211032.0, ans=0.2 2024-10-08 09:41:07,643 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=12.0 2024-10-08 09:41:50,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=211272.0, ans=0.125 2024-10-08 09:42:16,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=211512.0, ans=0.125 2024-10-08 09:42:24,334 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2024-10-08 09:42:33,955 INFO [train.py:1136] (0/2) Epoch 22, batch 500, loss[loss=0.2151, simple_loss=0.3148, pruned_loss=0.05767, over 83562.00 frames. ], tot_loss[loss=0.1939, simple_loss=0.2911, pruned_loss=0.04835, over 15725674.75 frames. ], batch size: 1079, lr: 1.14e-02, grad_scale: 16.0 2024-10-08 09:43:06,648 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:43:14,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=211872.0, ans=0.125 2024-10-08 09:43:21,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=211872.0, ans=0.07 2024-10-08 09:43:36,708 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.710e+02 4.619e+02 5.020e+02 5.713e+02 1.018e+03, threshold=1.004e+03, percent-clipped=1.0 2024-10-08 09:43:55,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=212112.0, ans=0.0 2024-10-08 09:43:57,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.38 vs. limit=6.0 2024-10-08 09:43:58,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=212112.0, ans=0.2 2024-10-08 09:44:07,021 INFO [train.py:1136] (0/2) Epoch 22, batch 550, loss[loss=0.1749, simple_loss=0.2746, pruned_loss=0.03756, over 87125.00 frames. ], tot_loss[loss=0.1926, simple_loss=0.2901, pruned_loss=0.04761, over 16056962.72 frames. ], batch size: 264, lr: 1.14e-02, grad_scale: 16.0 2024-10-08 09:44:07,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=212232.0, ans=0.125 2024-10-08 09:44:09,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=212232.0, ans=0.0 2024-10-08 09:44:12,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=212232.0, ans=0.125 2024-10-08 09:44:16,954 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-10-08 09:44:30,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=212352.0, ans=0.125 2024-10-08 09:45:04,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=212592.0, ans=10.0 2024-10-08 09:45:40,419 INFO [train.py:1136] (0/2) Epoch 22, batch 600, loss[loss=0.179, simple_loss=0.2754, pruned_loss=0.04131, over 87062.00 frames. ], tot_loss[loss=0.1925, simple_loss=0.2899, pruned_loss=0.04749, over 16309739.82 frames. ], batch size: 350, lr: 1.14e-02, grad_scale: 16.0 2024-10-08 09:46:28,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=213072.0, ans=0.5 2024-10-08 09:46:37,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=213192.0, ans=0.025 2024-10-08 09:46:40,994 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=213192.0, ans=0.125 2024-10-08 09:46:47,523 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.703e+02 4.252e+02 4.797e+02 5.741e+02 9.215e+02, threshold=9.595e+02, percent-clipped=0.0 2024-10-08 09:47:00,846 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2024-10-08 09:47:13,239 INFO [train.py:1136] (0/2) Epoch 22, batch 650, loss[loss=0.1981, simple_loss=0.2956, pruned_loss=0.05031, over 87019.00 frames. ], tot_loss[loss=0.1932, simple_loss=0.2906, pruned_loss=0.0479, over 16490408.15 frames. ], batch size: 583, lr: 1.14e-02, grad_scale: 16.0 2024-10-08 09:47:37,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=213552.0, ans=0.2 2024-10-08 09:47:39,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-10-08 09:48:09,073 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:48:28,720 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-10-08 09:48:45,024 INFO [train.py:1136] (0/2) Epoch 22, batch 700, loss[loss=0.187, simple_loss=0.2866, pruned_loss=0.04371, over 87365.00 frames. ], tot_loss[loss=0.1939, simple_loss=0.2914, pruned_loss=0.04823, over 16589854.65 frames. ], batch size: 415, lr: 1.14e-02, grad_scale: 16.0 2024-10-08 09:49:04,966 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.12 vs. limit=22.5 2024-10-08 09:49:06,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=22.5 2024-10-08 09:49:15,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=214272.0, ans=0.0 2024-10-08 09:49:23,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=214272.0, ans=0.0 2024-10-08 09:49:40,509 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.620e+02 4.297e+02 4.867e+02 5.684e+02 1.462e+03, threshold=9.734e+02, percent-clipped=1.0 2024-10-08 09:50:01,790 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2024-10-08 09:50:07,117 INFO [train.py:1136] (0/2) Epoch 22, batch 750, loss[loss=0.204, simple_loss=0.3033, pruned_loss=0.05235, over 84480.00 frames. ], tot_loss[loss=0.1942, simple_loss=0.2918, pruned_loss=0.04831, over 16708151.27 frames. ], batch size: 958, lr: 1.13e-02, grad_scale: 16.0 2024-10-08 09:50:14,044 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=214632.0, ans=0.05 2024-10-08 09:50:28,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=214752.0, ans=0.0 2024-10-08 09:50:45,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=214872.0, ans=0.2 2024-10-08 09:50:45,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=214872.0, ans=0.125 2024-10-08 09:50:49,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=214872.0, ans=15.0 2024-10-08 09:51:19,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=215112.0, ans=0.1 2024-10-08 09:51:27,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=215112.0, ans=0.125 2024-10-08 09:51:30,578 INFO [train.py:1136] (0/2) Epoch 22, batch 800, loss[loss=0.1855, simple_loss=0.2902, pruned_loss=0.04042, over 87306.00 frames. ], tot_loss[loss=0.1952, simple_loss=0.2926, pruned_loss=0.04889, over 16727161.29 frames. ], batch size: 415, lr: 1.13e-02, grad_scale: 32.0 2024-10-08 09:51:39,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=215232.0, ans=0.125 2024-10-08 09:51:56,878 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-22.pt 2024-10-08 09:52:49,668 INFO [train.py:1136] (0/2) Epoch 23, batch 0, loss[loss=0.1897, simple_loss=0.2791, pruned_loss=0.05014, over 85707.00 frames. ], tot_loss[loss=0.1897, simple_loss=0.2791, pruned_loss=0.05014, over 85707.00 frames. ], batch size: 180, lr: 1.11e-02, grad_scale: 32.0 2024-10-08 09:52:49,669 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 09:53:00,614 INFO [train.py:1168] (0/2) Epoch 23, validation: loss=0.1688, simple_loss=0.2814, pruned_loss=0.02812, over 1382211.00 frames. 2024-10-08 09:53:00,614 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 09:53:05,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=8.0 2024-10-08 09:53:37,962 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.861e+02 4.537e+02 5.243e+02 6.085e+02 8.339e+02, threshold=1.049e+03, percent-clipped=0.0 2024-10-08 09:53:53,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=215664.0, ans=0.125 2024-10-08 09:54:37,768 INFO [train.py:1136] (0/2) Epoch 23, batch 50, loss[loss=0.204, simple_loss=0.2988, pruned_loss=0.05465, over 86823.00 frames. ], tot_loss[loss=0.193, simple_loss=0.2906, pruned_loss=0.0477, over 3838117.24 frames. ], batch size: 547, lr: 1.11e-02, grad_scale: 32.0 2024-10-08 09:54:39,105 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2024-10-08 09:54:39,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=216024.0, ans=0.125 2024-10-08 09:55:06,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=216144.0, ans=0.125 2024-10-08 09:55:15,301 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.65 vs. limit=10.0 2024-10-08 09:55:17,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=216264.0, ans=0.125 2024-10-08 09:55:50,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=216384.0, ans=0.05 2024-10-08 09:56:15,319 INFO [train.py:1136] (0/2) Epoch 23, batch 100, loss[loss=0.1842, simple_loss=0.277, pruned_loss=0.04568, over 87072.00 frames. ], tot_loss[loss=0.1944, simple_loss=0.2916, pruned_loss=0.04857, over 6740501.95 frames. ], batch size: 264, lr: 1.10e-02, grad_scale: 32.0 2024-10-08 09:56:24,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=216624.0, ans=0.0 2024-10-08 09:56:25,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=216624.0, ans=0.125 2024-10-08 09:56:52,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.753e+02 4.366e+02 4.937e+02 5.881e+02 8.504e+02, threshold=9.874e+02, percent-clipped=0.0 2024-10-08 09:57:51,818 INFO [train.py:1136] (0/2) Epoch 23, batch 150, loss[loss=0.1854, simple_loss=0.2805, pruned_loss=0.04514, over 87299.00 frames. ], tot_loss[loss=0.1932, simple_loss=0.2901, pruned_loss=0.04817, over 9044616.67 frames. ], batch size: 313, lr: 1.10e-02, grad_scale: 32.0 2024-10-08 09:57:52,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=217224.0, ans=0.125 2024-10-08 09:58:45,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=22.5 2024-10-08 09:58:49,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=217584.0, ans=0.125 2024-10-08 09:59:14,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=217704.0, ans=0.125 2024-10-08 09:59:15,010 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.47 vs. limit=15.0 2024-10-08 09:59:28,288 INFO [train.py:1136] (0/2) Epoch 23, batch 200, loss[loss=0.1795, simple_loss=0.2817, pruned_loss=0.03861, over 87267.00 frames. ], tot_loss[loss=0.193, simple_loss=0.2899, pruned_loss=0.04805, over 10819744.62 frames. ], batch size: 439, lr: 1.10e-02, grad_scale: 16.0 2024-10-08 09:59:28,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=217824.0, ans=0.2 2024-10-08 10:00:05,376 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.551e+02 4.263e+02 4.785e+02 5.345e+02 8.202e+02, threshold=9.570e+02, percent-clipped=0.0 2024-10-08 10:00:06,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.11 vs. limit=6.0 2024-10-08 10:00:07,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218064.0, ans=0.1 2024-10-08 10:00:12,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218064.0, ans=0.1 2024-10-08 10:00:26,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.41 vs. limit=12.0 2024-10-08 10:00:32,907 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-10-08 10:00:35,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=218184.0, ans=0.125 2024-10-08 10:00:38,151 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2024-10-08 10:00:41,856 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-10-08 10:00:43,569 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=22.5 2024-10-08 10:01:02,598 INFO [train.py:1136] (0/2) Epoch 23, batch 250, loss[loss=0.1958, simple_loss=0.2901, pruned_loss=0.05074, over 87341.00 frames. ], tot_loss[loss=0.1932, simple_loss=0.2899, pruned_loss=0.04824, over 12182206.93 frames. ], batch size: 313, lr: 1.10e-02, grad_scale: 16.0 2024-10-08 10:01:15,974 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=218424.0, ans=0.0 2024-10-08 10:01:40,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=218664.0, ans=0.125 2024-10-08 10:01:40,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=218664.0, ans=0.2 2024-10-08 10:01:53,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=218664.0, ans=0.0 2024-10-08 10:02:01,375 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.21 vs. limit=10.0 2024-10-08 10:02:09,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=218784.0, ans=0.1 2024-10-08 10:02:23,068 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=218904.0, ans=0.125 2024-10-08 10:02:34,160 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.27 vs. limit=12.0 2024-10-08 10:02:38,687 INFO [train.py:1136] (0/2) Epoch 23, batch 300, loss[loss=0.1762, simple_loss=0.2734, pruned_loss=0.03951, over 86628.00 frames. ], tot_loss[loss=0.1938, simple_loss=0.291, pruned_loss=0.04831, over 13265268.70 frames. ], batch size: 246, lr: 1.10e-02, grad_scale: 16.0 2024-10-08 10:03:14,852 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.507e+02 4.435e+02 5.166e+02 5.836e+02 1.099e+03, threshold=1.033e+03, percent-clipped=3.0 2024-10-08 10:03:15,664 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2024-10-08 10:03:48,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219384.0, ans=0.1 2024-10-08 10:03:48,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2024-10-08 10:04:10,691 INFO [train.py:1136] (0/2) Epoch 23, batch 350, loss[loss=0.1827, simple_loss=0.2822, pruned_loss=0.04155, over 87348.00 frames. ], tot_loss[loss=0.1924, simple_loss=0.2898, pruned_loss=0.04751, over 14143010.66 frames. ], batch size: 372, lr: 1.10e-02, grad_scale: 16.0 2024-10-08 10:04:28,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=219744.0, ans=0.0 2024-10-08 10:04:44,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=22.5 2024-10-08 10:04:54,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=219864.0, ans=0.0 2024-10-08 10:05:22,536 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2024-10-08 10:05:47,082 INFO [train.py:1136] (0/2) Epoch 23, batch 400, loss[loss=0.1972, simple_loss=0.2969, pruned_loss=0.0488, over 87016.00 frames. ], tot_loss[loss=0.1927, simple_loss=0.29, pruned_loss=0.04765, over 14768503.86 frames. ], batch size: 583, lr: 1.10e-02, grad_scale: 32.0 2024-10-08 10:05:54,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=220224.0, ans=0.125 2024-10-08 10:06:27,061 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.555e+02 4.286e+02 4.823e+02 5.828e+02 1.580e+03, threshold=9.645e+02, percent-clipped=2.0 2024-10-08 10:06:34,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=220464.0, ans=0.125 2024-10-08 10:06:41,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=220464.0, ans=0.2 2024-10-08 10:07:23,745 INFO [train.py:1136] (0/2) Epoch 23, batch 450, loss[loss=0.1995, simple_loss=0.3008, pruned_loss=0.04911, over 85786.00 frames. ], tot_loss[loss=0.1924, simple_loss=0.2901, pruned_loss=0.0474, over 15309157.85 frames. ], batch size: 721, lr: 1.10e-02, grad_scale: 32.0 2024-10-08 10:07:27,717 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=220824.0, ans=0.2 2024-10-08 10:07:50,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=220944.0, ans=0.025 2024-10-08 10:08:02,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=221064.0, ans=0.0 2024-10-08 10:08:52,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=221304.0, ans=0.025 2024-10-08 10:08:57,066 INFO [train.py:1136] (0/2) Epoch 23, batch 500, loss[loss=0.1837, simple_loss=0.2871, pruned_loss=0.04019, over 87130.00 frames. ], tot_loss[loss=0.1924, simple_loss=0.2901, pruned_loss=0.04733, over 15732153.40 frames. ], batch size: 517, lr: 1.09e-02, grad_scale: 32.0 2024-10-08 10:09:32,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=221544.0, ans=0.125 2024-10-08 10:09:35,138 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-10-08 10:09:35,844 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.716e+02 4.480e+02 5.099e+02 5.837e+02 8.532e+02, threshold=1.020e+03, percent-clipped=0.0 2024-10-08 10:09:38,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=221664.0, ans=0.0 2024-10-08 10:09:42,444 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2024-10-08 10:09:43,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=221664.0, ans=0.0 2024-10-08 10:09:47,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=221664.0, ans=0.0 2024-10-08 10:10:01,909 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-10-08 10:10:24,812 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=22.5 2024-10-08 10:10:30,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-10-08 10:10:31,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=222024.0, ans=0.1 2024-10-08 10:10:32,336 INFO [train.py:1136] (0/2) Epoch 23, batch 550, loss[loss=0.1855, simple_loss=0.2795, pruned_loss=0.04569, over 87240.00 frames. ], tot_loss[loss=0.1928, simple_loss=0.2904, pruned_loss=0.04755, over 16025878.93 frames. ], batch size: 264, lr: 1.09e-02, grad_scale: 32.0 2024-10-08 10:10:37,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=222024.0, ans=0.0 2024-10-08 10:10:37,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=222024.0, ans=0.2 2024-10-08 10:10:43,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=222024.0, ans=0.125 2024-10-08 10:10:43,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=222024.0, ans=0.1 2024-10-08 10:10:48,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=222144.0, ans=12.0 2024-10-08 10:10:58,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=222144.0, ans=0.2 2024-10-08 10:10:59,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=222144.0, ans=0.125 2024-10-08 10:11:26,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=222264.0, ans=0.2 2024-10-08 10:11:34,802 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=222384.0, ans=0.125 2024-10-08 10:11:45,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=222384.0, ans=0.125 2024-10-08 10:11:48,899 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=222504.0, ans=0.05 2024-10-08 10:11:57,094 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222504.0, ans=0.125 2024-10-08 10:12:02,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=222504.0, ans=0.125 2024-10-08 10:12:07,398 INFO [train.py:1136] (0/2) Epoch 23, batch 600, loss[loss=0.1755, simple_loss=0.267, pruned_loss=0.04193, over 85589.00 frames. ], tot_loss[loss=0.1922, simple_loss=0.2898, pruned_loss=0.04733, over 16264113.51 frames. ], batch size: 180, lr: 1.09e-02, grad_scale: 32.0 2024-10-08 10:12:46,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.693e+02 4.347e+02 5.066e+02 5.945e+02 1.057e+03, threshold=1.013e+03, percent-clipped=1.0 2024-10-08 10:12:48,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=222864.0, ans=0.125 2024-10-08 10:13:12,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=222984.0, ans=0.2 2024-10-08 10:13:41,890 INFO [train.py:1136] (0/2) Epoch 23, batch 650, loss[loss=0.1771, simple_loss=0.282, pruned_loss=0.03609, over 87260.00 frames. ], tot_loss[loss=0.1932, simple_loss=0.2906, pruned_loss=0.04788, over 16419570.46 frames. ], batch size: 517, lr: 1.09e-02, grad_scale: 16.0 2024-10-08 10:14:05,435 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223344.0, ans=0.1 2024-10-08 10:14:16,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=223464.0, ans=0.0 2024-10-08 10:14:23,595 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.42 vs. limit=15.0 2024-10-08 10:14:29,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=223464.0, ans=0.125 2024-10-08 10:14:35,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=223584.0, ans=0.125 2024-10-08 10:15:05,806 INFO [train.py:1136] (0/2) Epoch 23, batch 700, loss[loss=0.2029, simple_loss=0.2998, pruned_loss=0.05301, over 86420.00 frames. ], tot_loss[loss=0.1938, simple_loss=0.2915, pruned_loss=0.04808, over 16534773.64 frames. ], batch size: 668, lr: 1.09e-02, grad_scale: 16.0 2024-10-08 10:15:15,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=223824.0, ans=0.125 2024-10-08 10:15:24,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.77 vs. limit=15.0 2024-10-08 10:15:31,794 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2024-10-08 10:15:41,760 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.564e+02 4.409e+02 4.998e+02 5.541e+02 1.175e+03, threshold=9.996e+02, percent-clipped=1.0 2024-10-08 10:15:43,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=224064.0, ans=0.0 2024-10-08 10:15:45,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-10-08 10:16:15,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=224304.0, ans=0.2 2024-10-08 10:16:20,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=224304.0, ans=0.1 2024-10-08 10:16:20,927 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-10-08 10:16:23,186 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2024-10-08 10:16:28,885 INFO [train.py:1136] (0/2) Epoch 23, batch 750, loss[loss=0.1789, simple_loss=0.2754, pruned_loss=0.04118, over 86623.00 frames. ], tot_loss[loss=0.1929, simple_loss=0.2907, pruned_loss=0.04757, over 16672289.19 frames. ], batch size: 246, lr: 1.09e-02, grad_scale: 16.0 2024-10-08 10:16:35,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=224424.0, ans=0.0 2024-10-08 10:16:57,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=224544.0, ans=0.125 2024-10-08 10:17:50,547 INFO [train.py:1136] (0/2) Epoch 23, batch 800, loss[loss=0.2473, simple_loss=0.3341, pruned_loss=0.08023, over 78368.00 frames. ], tot_loss[loss=0.1931, simple_loss=0.2908, pruned_loss=0.0477, over 16722828.38 frames. ], batch size: 1493, lr: 1.09e-02, grad_scale: 32.0 2024-10-08 10:17:58,221 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-10-08 10:18:15,937 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-23.pt 2024-10-08 10:19:27,430 INFO [train.py:1136] (0/2) Epoch 24, batch 0, loss[loss=0.1764, simple_loss=0.267, pruned_loss=0.04283, over 85769.00 frames. ], tot_loss[loss=0.1764, simple_loss=0.267, pruned_loss=0.04283, over 85769.00 frames. ], batch size: 180, lr: 1.06e-02, grad_scale: 32.0 2024-10-08 10:19:27,432 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 10:19:40,904 INFO [train.py:1168] (0/2) Epoch 24, validation: loss=0.1681, simple_loss=0.2801, pruned_loss=0.02804, over 1382211.00 frames. 2024-10-08 10:19:40,905 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 10:19:43,149 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 10:19:49,154 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.614e+02 4.157e+02 4.966e+02 5.490e+02 8.991e+02, threshold=9.932e+02, percent-clipped=0.0 2024-10-08 10:19:49,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=225216.0, ans=0.025 2024-10-08 10:20:09,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=225336.0, ans=0.0 2024-10-08 10:20:14,250 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=225456.0, ans=0.0 2024-10-08 10:20:53,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=225696.0, ans=0.125 2024-10-08 10:21:01,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=225696.0, ans=0.125 2024-10-08 10:21:12,682 INFO [train.py:1136] (0/2) Epoch 24, batch 50, loss[loss=0.197, simple_loss=0.2975, pruned_loss=0.04826, over 86513.00 frames. ], tot_loss[loss=0.1886, simple_loss=0.2863, pruned_loss=0.04548, over 3891353.52 frames. ], batch size: 668, lr: 1.06e-02, grad_scale: 32.0 2024-10-08 10:21:35,965 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=225936.0, ans=0.025 2024-10-08 10:21:35,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=225936.0, ans=0.1 2024-10-08 10:21:58,584 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2024-10-08 10:22:09,860 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=226176.0, ans=0.125 2024-10-08 10:22:38,973 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2024-10-08 10:22:45,978 INFO [train.py:1136] (0/2) Epoch 24, batch 100, loss[loss=0.2025, simple_loss=0.3017, pruned_loss=0.05165, over 85526.00 frames. ], tot_loss[loss=0.1883, simple_loss=0.2859, pruned_loss=0.04538, over 6829063.47 frames. ], batch size: 786, lr: 1.06e-02, grad_scale: 32.0 2024-10-08 10:22:56,852 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.681e+02 4.472e+02 5.028e+02 5.902e+02 7.793e+02, threshold=1.006e+03, percent-clipped=0.0 2024-10-08 10:23:01,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=226416.0, ans=0.125 2024-10-08 10:23:10,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=226536.0, ans=0.0 2024-10-08 10:24:10,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=226896.0, ans=0.125 2024-10-08 10:24:22,562 INFO [train.py:1136] (0/2) Epoch 24, batch 150, loss[loss=0.1815, simple_loss=0.2732, pruned_loss=0.04486, over 86116.00 frames. ], tot_loss[loss=0.1902, simple_loss=0.288, pruned_loss=0.04624, over 9059212.27 frames. ], batch size: 197, lr: 1.06e-02, grad_scale: 16.0 2024-10-08 10:24:35,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=227016.0, ans=0.035 2024-10-08 10:25:00,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=227256.0, ans=0.125 2024-10-08 10:25:10,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=227256.0, ans=0.2 2024-10-08 10:25:10,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=227256.0, ans=0.0 2024-10-08 10:25:30,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=227376.0, ans=0.0 2024-10-08 10:25:36,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=227376.0, ans=0.0 2024-10-08 10:25:55,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=227496.0, ans=0.1 2024-10-08 10:25:58,878 INFO [train.py:1136] (0/2) Epoch 24, batch 200, loss[loss=0.1739, simple_loss=0.2726, pruned_loss=0.03757, over 87186.00 frames. ], tot_loss[loss=0.192, simple_loss=0.2895, pruned_loss=0.04724, over 10805319.50 frames. ], batch size: 264, lr: 1.06e-02, grad_scale: 16.0 2024-10-08 10:26:11,757 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.757e+02 4.131e+02 4.546e+02 5.307e+02 1.734e+03, threshold=9.093e+02, percent-clipped=1.0 2024-10-08 10:26:23,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=22.5 2024-10-08 10:26:51,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2024-10-08 10:26:53,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=227976.0, ans=0.0 2024-10-08 10:27:12,403 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.26 vs. limit=22.5 2024-10-08 10:27:32,207 INFO [train.py:1136] (0/2) Epoch 24, batch 250, loss[loss=0.1709, simple_loss=0.2673, pruned_loss=0.03724, over 86651.00 frames. ], tot_loss[loss=0.1901, simple_loss=0.2877, pruned_loss=0.04624, over 12224723.69 frames. ], batch size: 229, lr: 1.06e-02, grad_scale: 16.0 2024-10-08 10:29:07,702 INFO [train.py:1136] (0/2) Epoch 24, batch 300, loss[loss=0.201, simple_loss=0.3025, pruned_loss=0.04977, over 85263.00 frames. ], tot_loss[loss=0.1901, simple_loss=0.2878, pruned_loss=0.04622, over 13306679.67 frames. ], batch size: 866, lr: 1.06e-02, grad_scale: 16.0 2024-10-08 10:29:18,644 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.600e+02 4.112e+02 4.658e+02 5.311e+02 7.704e+02, threshold=9.316e+02, percent-clipped=0.0 2024-10-08 10:29:18,998 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=228816.0, ans=0.125 2024-10-08 10:29:20,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=228816.0, ans=0.0 2024-10-08 10:29:22,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=228816.0, ans=0.1 2024-10-08 10:29:36,698 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=228936.0, ans=0.125 2024-10-08 10:29:40,531 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2024-10-08 10:29:44,170 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2024-10-08 10:29:45,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=229056.0, ans=0.125 2024-10-08 10:29:53,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=229056.0, ans=0.0 2024-10-08 10:29:58,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=229056.0, ans=0.125 2024-10-08 10:30:42,043 INFO [train.py:1136] (0/2) Epoch 24, batch 350, loss[loss=0.1954, simple_loss=0.2928, pruned_loss=0.04903, over 87081.00 frames. ], tot_loss[loss=0.1901, simple_loss=0.2879, pruned_loss=0.04617, over 14151323.56 frames. ], batch size: 548, lr: 1.05e-02, grad_scale: 16.0 2024-10-08 10:30:49,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=229416.0, ans=0.0 2024-10-08 10:30:53,951 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2024-10-08 10:31:19,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=229656.0, ans=10.0 2024-10-08 10:31:30,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.45 vs. limit=10.0 2024-10-08 10:31:54,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229776.0, ans=0.1 2024-10-08 10:32:17,164 INFO [train.py:1136] (0/2) Epoch 24, batch 400, loss[loss=0.1789, simple_loss=0.2776, pruned_loss=0.0401, over 87149.00 frames. ], tot_loss[loss=0.1898, simple_loss=0.2877, pruned_loss=0.04598, over 14816912.87 frames. ], batch size: 350, lr: 1.05e-02, grad_scale: 32.0 2024-10-08 10:32:27,650 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.691e+02 4.456e+02 5.054e+02 5.976e+02 1.010e+03, threshold=1.011e+03, percent-clipped=1.0 2024-10-08 10:32:40,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=230136.0, ans=0.0 2024-10-08 10:32:42,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=230136.0, ans=0.125 2024-10-08 10:32:47,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=230136.0, ans=0.0 2024-10-08 10:32:54,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=230256.0, ans=0.125 2024-10-08 10:32:59,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=230256.0, ans=0.0 2024-10-08 10:33:08,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=230256.0, ans=0.125 2024-10-08 10:33:19,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2024-10-08 10:33:20,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=230376.0, ans=0.0 2024-10-08 10:33:37,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=230496.0, ans=0.125 2024-10-08 10:33:52,533 INFO [train.py:1136] (0/2) Epoch 24, batch 450, loss[loss=0.1786, simple_loss=0.2736, pruned_loss=0.04175, over 86810.00 frames. ], tot_loss[loss=0.1895, simple_loss=0.2877, pruned_loss=0.0457, over 15357612.57 frames. ], batch size: 213, lr: 1.05e-02, grad_scale: 32.0 2024-10-08 10:34:05,787 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.23 vs. limit=10.0 2024-10-08 10:34:12,484 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-10-08 10:34:13,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=230736.0, ans=0.125 2024-10-08 10:34:18,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=230736.0, ans=0.0 2024-10-08 10:34:36,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=230856.0, ans=0.2 2024-10-08 10:34:42,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=230856.0, ans=0.125 2024-10-08 10:34:56,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=230976.0, ans=0.125 2024-10-08 10:35:04,361 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2024-10-08 10:35:10,987 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=231096.0, ans=0.125 2024-10-08 10:35:14,391 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=231096.0, ans=0.0 2024-10-08 10:35:27,200 INFO [train.py:1136] (0/2) Epoch 24, batch 500, loss[loss=0.194, simple_loss=0.2943, pruned_loss=0.04688, over 86279.00 frames. ], tot_loss[loss=0.1895, simple_loss=0.2877, pruned_loss=0.04569, over 15749468.58 frames. ], batch size: 620, lr: 1.05e-02, grad_scale: 8.0 2024-10-08 10:35:43,110 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.332e+02 4.269e+02 4.780e+02 5.157e+02 8.455e+02, threshold=9.559e+02, percent-clipped=0.0 2024-10-08 10:35:43,590 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=231216.0, ans=0.0 2024-10-08 10:36:17,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=231456.0, ans=0.125 2024-10-08 10:36:59,385 INFO [train.py:1136] (0/2) Epoch 24, batch 550, loss[loss=0.2005, simple_loss=0.3008, pruned_loss=0.05017, over 85914.00 frames. ], tot_loss[loss=0.1894, simple_loss=0.2875, pruned_loss=0.04565, over 16054093.72 frames. ], batch size: 721, lr: 1.05e-02, grad_scale: 8.0 2024-10-08 10:37:13,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=231816.0, ans=15.0 2024-10-08 10:37:48,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232056.0, ans=0.1 2024-10-08 10:38:34,212 INFO [train.py:1136] (0/2) Epoch 24, batch 600, loss[loss=0.1797, simple_loss=0.2794, pruned_loss=0.04001, over 87106.00 frames. ], tot_loss[loss=0.1894, simple_loss=0.2876, pruned_loss=0.04562, over 16307798.06 frames. ], batch size: 350, lr: 1.05e-02, grad_scale: 8.0 2024-10-08 10:38:34,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=232416.0, ans=0.0 2024-10-08 10:38:48,291 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.647e+02 4.235e+02 4.982e+02 6.020e+02 1.010e+03, threshold=9.963e+02, percent-clipped=1.0 2024-10-08 10:38:50,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=232536.0, ans=0.1 2024-10-08 10:38:57,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=232536.0, ans=0.0 2024-10-08 10:39:05,517 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2024-10-08 10:39:35,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=232776.0, ans=0.0 2024-10-08 10:39:36,913 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=232776.0, ans=0.125 2024-10-08 10:39:47,697 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=22.5 2024-10-08 10:40:08,108 INFO [train.py:1136] (0/2) Epoch 24, batch 650, loss[loss=0.1828, simple_loss=0.2845, pruned_loss=0.04057, over 87374.00 frames. ], tot_loss[loss=0.19, simple_loss=0.2882, pruned_loss=0.04589, over 16493826.75 frames. ], batch size: 439, lr: 1.05e-02, grad_scale: 8.0 2024-10-08 10:40:25,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=233016.0, ans=0.025 2024-10-08 10:40:29,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=233136.0, ans=0.125 2024-10-08 10:40:29,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=12.0 2024-10-08 10:40:36,025 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2024-10-08 10:41:39,868 INFO [train.py:1136] (0/2) Epoch 24, batch 700, loss[loss=0.1832, simple_loss=0.2793, pruned_loss=0.04348, over 87106.00 frames. ], tot_loss[loss=0.1895, simple_loss=0.2874, pruned_loss=0.04582, over 16657845.55 frames. ], batch size: 330, lr: 1.05e-02, grad_scale: 8.0 2024-10-08 10:41:40,230 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=233616.0, ans=0.125 2024-10-08 10:41:45,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233616.0, ans=0.1 2024-10-08 10:41:52,642 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.661e+02 4.243e+02 4.883e+02 5.770e+02 1.378e+03, threshold=9.766e+02, percent-clipped=1.0 2024-10-08 10:42:39,902 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=233976.0, ans=0.125 2024-10-08 10:42:46,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=234096.0, ans=0.125 2024-10-08 10:43:00,561 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 10:43:02,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=234216.0, ans=0.0 2024-10-08 10:43:03,437 INFO [train.py:1136] (0/2) Epoch 24, batch 750, loss[loss=0.1781, simple_loss=0.2801, pruned_loss=0.03805, over 87241.00 frames. ], tot_loss[loss=0.1901, simple_loss=0.288, pruned_loss=0.04608, over 16724003.78 frames. ], batch size: 415, lr: 1.05e-02, grad_scale: 8.0 2024-10-08 10:43:05,823 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=22.5 2024-10-08 10:43:28,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=234336.0, ans=0.125 2024-10-08 10:43:39,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=234456.0, ans=0.125 2024-10-08 10:43:52,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=234576.0, ans=0.025 2024-10-08 10:44:27,360 INFO [train.py:1136] (0/2) Epoch 24, batch 800, loss[loss=0.1903, simple_loss=0.292, pruned_loss=0.04434, over 86500.00 frames. ], tot_loss[loss=0.1906, simple_loss=0.2886, pruned_loss=0.04624, over 16781131.92 frames. ], batch size: 668, lr: 1.04e-02, grad_scale: 16.0 2024-10-08 10:44:41,674 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.486e+02 4.695e+02 5.387e+02 5.965e+02 1.777e+03, threshold=1.077e+03, percent-clipped=2.0 2024-10-08 10:44:45,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=234936.0, ans=0.05 2024-10-08 10:44:47,524 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-10-08 10:44:48,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=234936.0, ans=0.125 2024-10-08 10:44:54,797 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-24.pt 2024-10-08 10:45:59,553 INFO [train.py:1136] (0/2) Epoch 25, batch 0, loss[loss=0.1869, simple_loss=0.2838, pruned_loss=0.04499, over 87191.00 frames. ], tot_loss[loss=0.1869, simple_loss=0.2838, pruned_loss=0.04499, over 87191.00 frames. ], batch size: 296, lr: 1.02e-02, grad_scale: 32.0 2024-10-08 10:45:59,555 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 10:46:10,480 INFO [train.py:1168] (0/2) Epoch 25, validation: loss=0.1711, simple_loss=0.285, pruned_loss=0.02858, over 1382211.00 frames. 2024-10-08 10:46:10,481 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 10:46:11,411 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-10-08 10:47:04,934 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-10-08 10:47:06,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=235368.0, ans=0.0 2024-10-08 10:47:20,532 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-10-08 10:47:26,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=235488.0, ans=0.125 2024-10-08 10:47:44,635 INFO [train.py:1136] (0/2) Epoch 25, batch 50, loss[loss=0.185, simple_loss=0.2791, pruned_loss=0.04542, over 87299.00 frames. ], tot_loss[loss=0.1896, simple_loss=0.2882, pruned_loss=0.04553, over 3852413.22 frames. ], batch size: 296, lr: 1.02e-02, grad_scale: 32.0 2024-10-08 10:47:52,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.16 vs. limit=15.0 2024-10-08 10:47:58,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=235608.0, ans=0.2 2024-10-08 10:48:23,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=235848.0, ans=0.0 2024-10-08 10:48:46,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=235968.0, ans=0.125 2024-10-08 10:49:00,124 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.748e+02 4.216e+02 4.513e+02 5.213e+02 9.501e+02, threshold=9.025e+02, percent-clipped=0.0 2024-10-08 10:49:07,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-10-08 10:49:15,717 INFO [train.py:1136] (0/2) Epoch 25, batch 100, loss[loss=0.1837, simple_loss=0.2876, pruned_loss=0.03986, over 87330.00 frames. ], tot_loss[loss=0.1891, simple_loss=0.2872, pruned_loss=0.04551, over 6804025.37 frames. ], batch size: 517, lr: 1.02e-02, grad_scale: 32.0 2024-10-08 10:49:50,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236328.0, ans=0.125 2024-10-08 10:50:01,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236448.0, ans=0.1 2024-10-08 10:50:30,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2024-10-08 10:50:33,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2024-10-08 10:50:42,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.56 vs. limit=15.0 2024-10-08 10:50:52,380 INFO [train.py:1136] (0/2) Epoch 25, batch 150, loss[loss=0.1979, simple_loss=0.2971, pruned_loss=0.04931, over 86208.00 frames. ], tot_loss[loss=0.1901, simple_loss=0.2884, pruned_loss=0.04588, over 9055614.97 frames. ], batch size: 667, lr: 1.02e-02, grad_scale: 32.0 2024-10-08 10:51:14,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=236928.0, ans=0.07 2024-10-08 10:51:41,579 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 10:52:09,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=237288.0, ans=0.125 2024-10-08 10:52:12,651 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.539e+02 4.324e+02 4.870e+02 5.493e+02 7.954e+02, threshold=9.740e+02, percent-clipped=0.0 2024-10-08 10:52:26,453 INFO [train.py:1136] (0/2) Epoch 25, batch 200, loss[loss=0.1972, simple_loss=0.3017, pruned_loss=0.04635, over 83304.00 frames. ], tot_loss[loss=0.1895, simple_loss=0.2879, pruned_loss=0.04561, over 10859252.95 frames. ], batch size: 1077, lr: 1.02e-02, grad_scale: 32.0 2024-10-08 10:53:06,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=237648.0, ans=0.0 2024-10-08 10:53:11,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=237648.0, ans=0.05 2024-10-08 10:53:14,991 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=237648.0, ans=0.125 2024-10-08 10:53:16,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237648.0, ans=0.1 2024-10-08 10:53:20,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=237648.0, ans=0.0 2024-10-08 10:54:03,129 INFO [train.py:1136] (0/2) Epoch 25, batch 250, loss[loss=0.2232, simple_loss=0.3128, pruned_loss=0.06681, over 69645.00 frames. ], tot_loss[loss=0.1895, simple_loss=0.2877, pruned_loss=0.04561, over 12229434.40 frames. ], batch size: 1960, lr: 1.02e-02, grad_scale: 32.0 2024-10-08 10:54:03,619 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=238008.0, ans=0.09899494936611666 2024-10-08 10:54:25,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=238128.0, ans=0.125 2024-10-08 10:55:01,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=238368.0, ans=0.125 2024-10-08 10:55:20,237 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.396e+02 4.147e+02 4.565e+02 4.972e+02 7.723e+02, threshold=9.130e+02, percent-clipped=0.0 2024-10-08 10:55:38,863 INFO [train.py:1136] (0/2) Epoch 25, batch 300, loss[loss=0.1966, simple_loss=0.2992, pruned_loss=0.04701, over 85486.00 frames. ], tot_loss[loss=0.1896, simple_loss=0.2878, pruned_loss=0.04574, over 13294652.16 frames. ], batch size: 787, lr: 1.02e-02, grad_scale: 32.0 2024-10-08 10:55:44,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=238608.0, ans=0.0 2024-10-08 10:55:51,833 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-10-08 10:56:23,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=238848.0, ans=0.2 2024-10-08 10:56:35,647 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=238968.0, ans=0.125 2024-10-08 10:56:39,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=238968.0, ans=0.1 2024-10-08 10:56:46,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=238968.0, ans=0.125 2024-10-08 10:56:59,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=239088.0, ans=0.0 2024-10-08 10:57:13,361 INFO [train.py:1136] (0/2) Epoch 25, batch 350, loss[loss=0.2187, simple_loss=0.3086, pruned_loss=0.06443, over 69954.00 frames. ], tot_loss[loss=0.1908, simple_loss=0.2888, pruned_loss=0.04641, over 14078584.75 frames. ], batch size: 1960, lr: 1.01e-02, grad_scale: 32.0 2024-10-08 10:58:21,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=15.0 2024-10-08 10:58:32,305 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.538e+02 4.129e+02 4.581e+02 5.180e+02 8.084e+02, threshold=9.162e+02, percent-clipped=0.0 2024-10-08 10:58:45,647 INFO [train.py:1136] (0/2) Epoch 25, batch 400, loss[loss=0.1771, simple_loss=0.2793, pruned_loss=0.0375, over 87300.00 frames. ], tot_loss[loss=0.1901, simple_loss=0.288, pruned_loss=0.04607, over 14761224.50 frames. ], batch size: 415, lr: 1.01e-02, grad_scale: 32.0 2024-10-08 10:59:03,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=239928.0, ans=0.125 2024-10-08 10:59:11,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239928.0, ans=0.1 2024-10-08 10:59:11,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=239928.0, ans=0.025 2024-10-08 10:59:14,733 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-20000.pt 2024-10-08 10:59:20,035 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=239928.0, ans=0.125 2024-10-08 10:59:20,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=239928.0, ans=0.125 2024-10-08 10:59:25,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=240048.0, ans=0.125 2024-10-08 10:59:27,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240048.0, ans=0.1 2024-10-08 10:59:32,556 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=240048.0, ans=0.125 2024-10-08 10:59:35,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=240048.0, ans=0.07 2024-10-08 10:59:38,055 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.23 vs. limit=15.0 2024-10-08 10:59:59,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=240168.0, ans=0.125 2024-10-08 10:59:59,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=240168.0, ans=0.125 2024-10-08 11:00:00,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=240168.0, ans=0.125 2024-10-08 11:00:12,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=240288.0, ans=0.125 2024-10-08 11:00:12,279 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2024-10-08 11:00:23,568 INFO [train.py:1136] (0/2) Epoch 25, batch 450, loss[loss=0.1766, simple_loss=0.2822, pruned_loss=0.0355, over 87190.00 frames. ], tot_loss[loss=0.19, simple_loss=0.288, pruned_loss=0.04603, over 15258792.99 frames. ], batch size: 517, lr: 1.01e-02, grad_scale: 16.0 2024-10-08 11:00:59,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=240528.0, ans=0.2 2024-10-08 11:01:04,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240648.0, ans=0.1 2024-10-08 11:01:07,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=240648.0, ans=0.0 2024-10-08 11:01:28,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=240768.0, ans=0.125 2024-10-08 11:01:28,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=240768.0, ans=0.05 2024-10-08 11:01:34,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-10-08 11:01:47,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=240888.0, ans=0.125 2024-10-08 11:01:48,739 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.639e+02 4.409e+02 5.066e+02 5.829e+02 8.397e+02, threshold=1.013e+03, percent-clipped=0.0 2024-10-08 11:02:00,840 INFO [train.py:1136] (0/2) Epoch 25, batch 500, loss[loss=0.1685, simple_loss=0.2599, pruned_loss=0.03857, over 85842.00 frames. ], tot_loss[loss=0.1901, simple_loss=0.288, pruned_loss=0.0461, over 15630445.38 frames. ], batch size: 180, lr: 1.01e-02, grad_scale: 16.0 2024-10-08 11:03:13,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=22.5 2024-10-08 11:03:35,535 INFO [train.py:1136] (0/2) Epoch 25, batch 550, loss[loss=0.1737, simple_loss=0.2653, pruned_loss=0.041, over 86671.00 frames. ], tot_loss[loss=0.1896, simple_loss=0.2876, pruned_loss=0.04583, over 15966579.07 frames. ], batch size: 213, lr: 1.01e-02, grad_scale: 16.0 2024-10-08 11:03:55,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=241728.0, ans=0.07 2024-10-08 11:04:23,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=241848.0, ans=0.0 2024-10-08 11:05:00,940 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.796e+02 4.244e+02 4.602e+02 5.327e+02 9.371e+02, threshold=9.204e+02, percent-clipped=0.0 2024-10-08 11:05:05,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242088.0, ans=0.1 2024-10-08 11:05:13,498 INFO [train.py:1136] (0/2) Epoch 25, batch 600, loss[loss=0.1784, simple_loss=0.2773, pruned_loss=0.03976, over 87108.00 frames. ], tot_loss[loss=0.19, simple_loss=0.288, pruned_loss=0.04596, over 16211422.34 frames. ], batch size: 330, lr: 1.01e-02, grad_scale: 16.0 2024-10-08 11:05:17,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=242208.0, ans=0.125 2024-10-08 11:05:21,769 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.40 vs. limit=15.0 2024-10-08 11:05:31,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=242328.0, ans=0.125 2024-10-08 11:05:53,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=242448.0, ans=0.0 2024-10-08 11:06:00,972 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-10-08 11:06:13,137 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.21 vs. limit=15.0 2024-10-08 11:06:37,369 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=242688.0, ans=0.2 2024-10-08 11:06:50,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=242688.0, ans=0.0 2024-10-08 11:06:53,383 INFO [train.py:1136] (0/2) Epoch 25, batch 650, loss[loss=0.1976, simple_loss=0.2955, pruned_loss=0.04988, over 86036.00 frames. ], tot_loss[loss=0.1902, simple_loss=0.2883, pruned_loss=0.04598, over 16392397.42 frames. ], batch size: 721, lr: 1.01e-02, grad_scale: 16.0 2024-10-08 11:07:01,189 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=22.5 2024-10-08 11:07:02,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=242808.0, ans=0.0 2024-10-08 11:07:10,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=242928.0, ans=0.125 2024-10-08 11:07:10,410 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=242928.0, ans=0.025 2024-10-08 11:07:55,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=243168.0, ans=0.125 2024-10-08 11:08:04,868 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.644e+02 4.110e+02 4.574e+02 5.371e+02 7.730e+02, threshold=9.148e+02, percent-clipped=0.0 2024-10-08 11:08:18,158 INFO [train.py:1136] (0/2) Epoch 25, batch 700, loss[loss=0.2272, simple_loss=0.3222, pruned_loss=0.06603, over 79004.00 frames. ], tot_loss[loss=0.189, simple_loss=0.2873, pruned_loss=0.04533, over 16581354.12 frames. ], batch size: 1493, lr: 1.01e-02, grad_scale: 16.0 2024-10-08 11:08:40,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=243528.0, ans=0.125 2024-10-08 11:08:43,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=243528.0, ans=0.09899494936611666 2024-10-08 11:09:01,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=243648.0, ans=0.0 2024-10-08 11:09:27,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=243888.0, ans=0.125 2024-10-08 11:09:39,708 INFO [train.py:1136] (0/2) Epoch 25, batch 750, loss[loss=0.1811, simple_loss=0.2875, pruned_loss=0.03736, over 87265.00 frames. ], tot_loss[loss=0.1884, simple_loss=0.2869, pruned_loss=0.04491, over 16705112.97 frames. ], batch size: 517, lr: 1.01e-02, grad_scale: 16.0 2024-10-08 11:09:48,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244008.0, ans=0.1 2024-10-08 11:10:03,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=244128.0, ans=0.125 2024-10-08 11:10:11,336 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=244248.0, ans=0.125 2024-10-08 11:10:17,766 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=244248.0, ans=0.0 2024-10-08 11:10:50,594 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.522e+02 4.260e+02 4.889e+02 5.585e+02 1.195e+03, threshold=9.779e+02, percent-clipped=2.0 2024-10-08 11:10:54,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=244488.0, ans=0.0 2024-10-08 11:11:02,079 INFO [train.py:1136] (0/2) Epoch 25, batch 800, loss[loss=0.1831, simple_loss=0.2872, pruned_loss=0.03948, over 87141.00 frames. ], tot_loss[loss=0.1887, simple_loss=0.2872, pruned_loss=0.0451, over 16766345.58 frames. ], batch size: 517, lr: 1.00e-02, grad_scale: 32.0 2024-10-08 11:11:14,098 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=244608.0, ans=0.125 2024-10-08 11:11:22,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=244728.0, ans=0.125 2024-10-08 11:11:28,232 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-25.pt 2024-10-08 11:12:19,008 INFO [train.py:1136] (0/2) Epoch 26, batch 0, loss[loss=0.1778, simple_loss=0.2823, pruned_loss=0.03661, over 87054.00 frames. ], tot_loss[loss=0.1778, simple_loss=0.2823, pruned_loss=0.03661, over 87054.00 frames. ], batch size: 517, lr: 9.84e-03, grad_scale: 32.0 2024-10-08 11:12:19,009 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 11:12:30,025 INFO [train.py:1168] (0/2) Epoch 26, validation: loss=0.1675, simple_loss=0.2794, pruned_loss=0.02779, over 1382211.00 frames. 2024-10-08 11:12:30,026 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 11:12:48,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=244920.0, ans=0.0 2024-10-08 11:12:48,828 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.04 vs. limit=6.0 2024-10-08 11:12:57,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244920.0, ans=0.1 2024-10-08 11:13:17,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245040.0, ans=0.1 2024-10-08 11:13:46,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=245280.0, ans=0.125 2024-10-08 11:13:48,498 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 11:14:05,994 INFO [train.py:1136] (0/2) Epoch 26, batch 50, loss[loss=0.1756, simple_loss=0.2812, pruned_loss=0.03499, over 87481.00 frames. ], tot_loss[loss=0.1842, simple_loss=0.2838, pruned_loss=0.0423, over 3879915.78 frames. ], batch size: 439, lr: 9.83e-03, grad_scale: 32.0 2024-10-08 11:14:06,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=245400.0, ans=0.2 2024-10-08 11:14:06,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=245400.0, ans=0.125 2024-10-08 11:14:14,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=245400.0, ans=0.0 2024-10-08 11:14:20,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=245400.0, ans=0.2 2024-10-08 11:14:27,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=245520.0, ans=0.125 2024-10-08 11:14:51,744 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.88 vs. limit=10.0 2024-10-08 11:14:57,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.621e+02 4.175e+02 4.828e+02 5.465e+02 7.812e+02, threshold=9.655e+02, percent-clipped=0.0 2024-10-08 11:15:09,651 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 11:15:25,730 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 11:15:28,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=245880.0, ans=0.2 2024-10-08 11:15:38,957 INFO [train.py:1136] (0/2) Epoch 26, batch 100, loss[loss=0.2018, simple_loss=0.3016, pruned_loss=0.05094, over 85320.00 frames. ], tot_loss[loss=0.1846, simple_loss=0.2841, pruned_loss=0.04254, over 6839842.11 frames. ], batch size: 866, lr: 9.82e-03, grad_scale: 32.0 2024-10-08 11:16:40,414 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=246360.0, ans=0.1 2024-10-08 11:17:12,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=246480.0, ans=0.025 2024-10-08 11:17:14,968 INFO [train.py:1136] (0/2) Epoch 26, batch 150, loss[loss=0.1747, simple_loss=0.2754, pruned_loss=0.03698, over 87411.00 frames. ], tot_loss[loss=0.1863, simple_loss=0.2855, pruned_loss=0.04356, over 9103389.71 frames. ], batch size: 393, lr: 9.81e-03, grad_scale: 32.0 2024-10-08 11:17:57,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=246840.0, ans=0.0 2024-10-08 11:18:02,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=246840.0, ans=0.125 2024-10-08 11:18:05,819 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.698e+02 4.288e+02 4.680e+02 5.314e+02 7.477e+02, threshold=9.359e+02, percent-clipped=0.0 2024-10-08 11:18:09,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=246960.0, ans=0.125 2024-10-08 11:18:16,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=246960.0, ans=0.0 2024-10-08 11:18:28,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=247080.0, ans=0.125 2024-10-08 11:18:48,085 INFO [train.py:1136] (0/2) Epoch 26, batch 200, loss[loss=0.1736, simple_loss=0.2682, pruned_loss=0.03952, over 86693.00 frames. ], tot_loss[loss=0.1864, simple_loss=0.2851, pruned_loss=0.04388, over 10905694.00 frames. ], batch size: 213, lr: 9.80e-03, grad_scale: 16.0 2024-10-08 11:19:34,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=247440.0, ans=0.07 2024-10-08 11:19:36,805 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-10-08 11:19:41,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247440.0, ans=0.1 2024-10-08 11:20:04,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=247680.0, ans=0.125 2024-10-08 11:20:08,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=247680.0, ans=0.0 2024-10-08 11:20:24,004 INFO [train.py:1136] (0/2) Epoch 26, batch 250, loss[loss=0.2054, simple_loss=0.3044, pruned_loss=0.05315, over 85443.00 frames. ], tot_loss[loss=0.1871, simple_loss=0.2857, pruned_loss=0.04431, over 12278949.73 frames. ], batch size: 786, lr: 9.79e-03, grad_scale: 8.0 2024-10-08 11:21:04,678 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=248040.0, ans=0.125 2024-10-08 11:21:06,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=248040.0, ans=0.125 2024-10-08 11:21:06,760 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2024-10-08 11:21:18,500 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.536e+02 4.241e+02 4.784e+02 5.537e+02 9.142e+02, threshold=9.567e+02, percent-clipped=0.0 2024-10-08 11:21:37,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=248160.0, ans=0.125 2024-10-08 11:21:46,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2024-10-08 11:21:57,461 INFO [train.py:1136] (0/2) Epoch 26, batch 300, loss[loss=0.1951, simple_loss=0.2944, pruned_loss=0.04785, over 86319.00 frames. ], tot_loss[loss=0.1865, simple_loss=0.2851, pruned_loss=0.04396, over 13382085.53 frames. ], batch size: 667, lr: 9.78e-03, grad_scale: 8.0 2024-10-08 11:22:20,044 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2024-10-08 11:22:48,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=248640.0, ans=0.125 2024-10-08 11:22:51,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=248760.0, ans=0.125 2024-10-08 11:23:05,883 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=248760.0, ans=0.0 2024-10-08 11:23:09,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=248880.0, ans=0.125 2024-10-08 11:23:09,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=248880.0, ans=0.025 2024-10-08 11:23:22,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=248880.0, ans=0.0 2024-10-08 11:23:30,678 INFO [train.py:1136] (0/2) Epoch 26, batch 350, loss[loss=0.1822, simple_loss=0.2774, pruned_loss=0.04352, over 86772.00 frames. ], tot_loss[loss=0.1852, simple_loss=0.284, pruned_loss=0.04322, over 14246764.39 frames. ], batch size: 246, lr: 9.77e-03, grad_scale: 8.0 2024-10-08 11:23:35,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=249000.0, ans=0.125 2024-10-08 11:23:44,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=249000.0, ans=0.07 2024-10-08 11:23:55,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=249120.0, ans=0.125 2024-10-08 11:24:26,704 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.435e+02 4.416e+02 4.967e+02 5.627e+02 8.971e+02, threshold=9.934e+02, percent-clipped=0.0 2024-10-08 11:24:34,732 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2024-10-08 11:24:45,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=249360.0, ans=0.125 2024-10-08 11:25:08,863 INFO [train.py:1136] (0/2) Epoch 26, batch 400, loss[loss=0.1907, simple_loss=0.2898, pruned_loss=0.04579, over 86058.00 frames. ], tot_loss[loss=0.1862, simple_loss=0.285, pruned_loss=0.04377, over 14846012.63 frames. ], batch size: 721, lr: 9.76e-03, grad_scale: 16.0 2024-10-08 11:25:37,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=249720.0, ans=0.2 2024-10-08 11:25:40,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=249720.0, ans=0.0 2024-10-08 11:26:12,759 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=249960.0, ans=0.09899494936611666 2024-10-08 11:26:39,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=250080.0, ans=15.0 2024-10-08 11:26:42,863 INFO [train.py:1136] (0/2) Epoch 26, batch 450, loss[loss=0.183, simple_loss=0.2852, pruned_loss=0.04039, over 87340.00 frames. ], tot_loss[loss=0.1861, simple_loss=0.2846, pruned_loss=0.04378, over 15367433.32 frames. ], batch size: 439, lr: 9.75e-03, grad_scale: 16.0 2024-10-08 11:27:19,750 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=22.5 2024-10-08 11:27:33,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=250440.0, ans=0.025 2024-10-08 11:27:38,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=250440.0, ans=0.025 2024-10-08 11:27:39,745 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.720e+02 4.234e+02 4.795e+02 5.319e+02 1.140e+03, threshold=9.589e+02, percent-clipped=1.0 2024-10-08 11:27:54,655 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2024-10-08 11:28:18,492 INFO [train.py:1136] (0/2) Epoch 26, batch 500, loss[loss=0.1935, simple_loss=0.2922, pruned_loss=0.04735, over 86434.00 frames. ], tot_loss[loss=0.187, simple_loss=0.2855, pruned_loss=0.0442, over 15733440.70 frames. ], batch size: 620, lr: 9.74e-03, grad_scale: 16.0 2024-10-08 11:28:19,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=250800.0, ans=0.125 2024-10-08 11:28:28,306 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 11:28:38,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=250920.0, ans=0.0 2024-10-08 11:29:15,351 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.224e-03 2024-10-08 11:29:23,131 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2024-10-08 11:29:48,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.94 vs. limit=15.0 2024-10-08 11:29:51,153 INFO [train.py:1136] (0/2) Epoch 26, batch 550, loss[loss=0.167, simple_loss=0.2593, pruned_loss=0.03735, over 86433.00 frames. ], tot_loss[loss=0.1864, simple_loss=0.2849, pruned_loss=0.04397, over 16069497.41 frames. ], batch size: 213, lr: 9.73e-03, grad_scale: 16.0 2024-10-08 11:30:07,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=251400.0, ans=15.0 2024-10-08 11:30:40,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2024-10-08 11:30:49,127 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.543e+02 3.999e+02 4.529e+02 5.302e+02 8.036e+02, threshold=9.059e+02, percent-clipped=0.0 2024-10-08 11:31:23,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=251880.0, ans=0.125 2024-10-08 11:31:28,531 INFO [train.py:1136] (0/2) Epoch 26, batch 600, loss[loss=0.1764, simple_loss=0.2717, pruned_loss=0.04058, over 86625.00 frames. ], tot_loss[loss=0.187, simple_loss=0.2854, pruned_loss=0.0443, over 16246636.46 frames. ], batch size: 246, lr: 9.72e-03, grad_scale: 16.0 2024-10-08 11:31:43,509 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=252000.0, ans=0.07 2024-10-08 11:32:34,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=252360.0, ans=0.125 2024-10-08 11:32:54,353 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-10-08 11:33:05,562 INFO [train.py:1136] (0/2) Epoch 26, batch 650, loss[loss=0.1952, simple_loss=0.2939, pruned_loss=0.04827, over 86397.00 frames. ], tot_loss[loss=0.187, simple_loss=0.2856, pruned_loss=0.04417, over 16420404.75 frames. ], batch size: 667, lr: 9.71e-03, grad_scale: 16.0 2024-10-08 11:33:15,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-10-08 11:33:28,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=252720.0, ans=0.125 2024-10-08 11:33:33,290 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 11:33:45,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=252840.0, ans=0.2 2024-10-08 11:33:59,075 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.732e+02 4.392e+02 4.948e+02 5.633e+02 7.950e+02, threshold=9.896e+02, percent-clipped=0.0 2024-10-08 11:34:00,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=252960.0, ans=0.0 2024-10-08 11:34:32,053 INFO [train.py:1136] (0/2) Epoch 26, batch 700, loss[loss=0.1796, simple_loss=0.2803, pruned_loss=0.03944, over 87435.00 frames. ], tot_loss[loss=0.1872, simple_loss=0.286, pruned_loss=0.04418, over 16590790.20 frames. ], batch size: 372, lr: 9.70e-03, grad_scale: 16.0 2024-10-08 11:34:57,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=253320.0, ans=0.0 2024-10-08 11:34:58,036 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=253320.0, ans=0.125 2024-10-08 11:35:09,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=253440.0, ans=0.125 2024-10-08 11:35:11,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=15.0 2024-10-08 11:35:15,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=253440.0, ans=0.04949747468305833 2024-10-08 11:35:19,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=253440.0, ans=0.0 2024-10-08 11:35:28,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=253560.0, ans=0.1 2024-10-08 11:35:47,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=253680.0, ans=0.125 2024-10-08 11:35:50,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=253680.0, ans=0.04949747468305833 2024-10-08 11:35:52,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=253680.0, ans=0.09899494936611666 2024-10-08 11:35:55,065 INFO [train.py:1136] (0/2) Epoch 26, batch 750, loss[loss=0.1759, simple_loss=0.2695, pruned_loss=0.04113, over 86681.00 frames. ], tot_loss[loss=0.1869, simple_loss=0.2855, pruned_loss=0.04417, over 16691691.94 frames. ], batch size: 229, lr: 9.69e-03, grad_scale: 16.0 2024-10-08 11:36:04,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=253800.0, ans=0.0 2024-10-08 11:36:04,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=253800.0, ans=0.0 2024-10-08 11:36:44,483 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.475e+02 4.418e+02 5.064e+02 5.763e+02 8.132e+02, threshold=1.013e+03, percent-clipped=0.0 2024-10-08 11:36:52,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254160.0, ans=0.1 2024-10-08 11:36:56,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=254160.0, ans=0.125 2024-10-08 11:36:59,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254160.0, ans=0.1 2024-10-08 11:37:20,205 INFO [train.py:1136] (0/2) Epoch 26, batch 800, loss[loss=0.1725, simple_loss=0.2715, pruned_loss=0.03672, over 86742.00 frames. ], tot_loss[loss=0.1884, simple_loss=0.2868, pruned_loss=0.04503, over 16692198.70 frames. ], batch size: 246, lr: 9.68e-03, grad_scale: 32.0 2024-10-08 11:37:22,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=254400.0, ans=0.1 2024-10-08 11:37:35,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254520.0, ans=0.1 2024-10-08 11:37:47,174 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-26.pt 2024-10-08 11:38:40,934 INFO [train.py:1136] (0/2) Epoch 27, batch 0, loss[loss=0.1836, simple_loss=0.2767, pruned_loss=0.04522, over 87291.00 frames. ], tot_loss[loss=0.1836, simple_loss=0.2767, pruned_loss=0.04522, over 87291.00 frames. ], batch size: 280, lr: 9.49e-03, grad_scale: 32.0 2024-10-08 11:38:40,935 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 11:38:52,251 INFO [train.py:1168] (0/2) Epoch 27, validation: loss=0.1682, simple_loss=0.2807, pruned_loss=0.02781, over 1382211.00 frames. 2024-10-08 11:38:52,252 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 11:39:46,568 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2024-10-08 11:39:58,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=254952.0, ans=0.0 2024-10-08 11:39:58,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-10-08 11:40:30,040 INFO [train.py:1136] (0/2) Epoch 27, batch 50, loss[loss=0.1801, simple_loss=0.2731, pruned_loss=0.04356, over 86438.00 frames. ], tot_loss[loss=0.1864, simple_loss=0.2858, pruned_loss=0.04355, over 3861521.87 frames. ], batch size: 197, lr: 9.48e-03, grad_scale: 16.0 2024-10-08 11:40:41,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=255192.0, ans=0.2 2024-10-08 11:40:57,514 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.623e+02 4.409e+02 4.906e+02 6.007e+02 7.735e+02, threshold=9.812e+02, percent-clipped=0.0 2024-10-08 11:41:00,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-10-08 11:41:17,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255432.0, ans=0.0 2024-10-08 11:41:28,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=255552.0, ans=0.0 2024-10-08 11:41:28,851 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=22.5 2024-10-08 11:42:07,103 INFO [train.py:1136] (0/2) Epoch 27, batch 100, loss[loss=0.1895, simple_loss=0.2895, pruned_loss=0.04478, over 87115.00 frames. ], tot_loss[loss=0.1872, simple_loss=0.2864, pruned_loss=0.04397, over 6747267.66 frames. ], batch size: 583, lr: 9.47e-03, grad_scale: 16.0 2024-10-08 11:42:10,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=255792.0, ans=0.125 2024-10-08 11:42:10,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=255792.0, ans=0.0 2024-10-08 11:42:12,832 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.48 vs. limit=15.0 2024-10-08 11:42:17,856 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 11:42:32,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255912.0, ans=0.1 2024-10-08 11:42:48,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256032.0, ans=0.1 2024-10-08 11:42:54,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=256032.0, ans=0.125 2024-10-08 11:43:07,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=256152.0, ans=0.125 2024-10-08 11:43:21,326 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.32 vs. limit=6.0 2024-10-08 11:43:27,930 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=256272.0, ans=0.0 2024-10-08 11:43:37,983 INFO [train.py:1136] (0/2) Epoch 27, batch 150, loss[loss=0.1812, simple_loss=0.2789, pruned_loss=0.04177, over 87171.00 frames. ], tot_loss[loss=0.1867, simple_loss=0.2858, pruned_loss=0.04382, over 9055039.25 frames. ], batch size: 350, lr: 9.46e-03, grad_scale: 16.0 2024-10-08 11:43:42,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=256392.0, ans=0.125 2024-10-08 11:44:06,643 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.525e+02 4.056e+02 4.585e+02 5.444e+02 9.594e+02, threshold=9.169e+02, percent-clipped=0.0 2024-10-08 11:44:08,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=256512.0, ans=0.125 2024-10-08 11:44:15,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=256632.0, ans=0.125 2024-10-08 11:44:22,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=256632.0, ans=0.2 2024-10-08 11:44:36,249 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.21 vs. limit=10.0 2024-10-08 11:45:13,185 INFO [train.py:1136] (0/2) Epoch 27, batch 200, loss[loss=0.1816, simple_loss=0.2843, pruned_loss=0.03938, over 87318.00 frames. ], tot_loss[loss=0.1862, simple_loss=0.2849, pruned_loss=0.04378, over 10845997.48 frames. ], batch size: 393, lr: 9.45e-03, grad_scale: 16.0 2024-10-08 11:45:35,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=257112.0, ans=0.0 2024-10-08 11:45:46,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=257112.0, ans=0.125 2024-10-08 11:46:25,687 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=257352.0, ans=0.125 2024-10-08 11:46:49,955 INFO [train.py:1136] (0/2) Epoch 27, batch 250, loss[loss=0.2086, simple_loss=0.3089, pruned_loss=0.05418, over 82085.00 frames. ], tot_loss[loss=0.1867, simple_loss=0.2851, pruned_loss=0.04412, over 12226128.42 frames. ], batch size: 1245, lr: 9.44e-03, grad_scale: 16.0 2024-10-08 11:46:52,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=257592.0, ans=0.0 2024-10-08 11:46:59,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=257592.0, ans=0.125 2024-10-08 11:47:15,985 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.605e+02 4.282e+02 4.818e+02 5.448e+02 7.597e+02, threshold=9.637e+02, percent-clipped=0.0 2024-10-08 11:47:25,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257832.0, ans=0.1 2024-10-08 11:47:46,426 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=257952.0, ans=0.125 2024-10-08 11:48:04,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=258072.0, ans=0.1 2024-10-08 11:48:11,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2024-10-08 11:48:23,026 INFO [train.py:1136] (0/2) Epoch 27, batch 300, loss[loss=0.2016, simple_loss=0.3017, pruned_loss=0.05075, over 81912.00 frames. ], tot_loss[loss=0.1854, simple_loss=0.284, pruned_loss=0.04343, over 13341364.78 frames. ], batch size: 1245, lr: 9.43e-03, grad_scale: 16.0 2024-10-08 11:48:45,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=258312.0, ans=0.125 2024-10-08 11:48:53,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=258312.0, ans=0.2 2024-10-08 11:49:00,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=258312.0, ans=0.0 2024-10-08 11:49:20,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=258432.0, ans=0.125 2024-10-08 11:49:37,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=258552.0, ans=0.125 2024-10-08 11:49:40,461 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2024-10-08 11:50:00,455 INFO [train.py:1136] (0/2) Epoch 27, batch 350, loss[loss=0.1811, simple_loss=0.2809, pruned_loss=0.04065, over 87365.00 frames. ], tot_loss[loss=0.1869, simple_loss=0.2853, pruned_loss=0.04423, over 14120483.71 frames. ], batch size: 415, lr: 9.42e-03, grad_scale: 16.0 2024-10-08 11:50:00,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=258792.0, ans=0.125 2024-10-08 11:50:16,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258792.0, ans=0.1 2024-10-08 11:50:16,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258792.0, ans=0.1 2024-10-08 11:50:16,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=258792.0, ans=0.125 2024-10-08 11:50:18,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=258912.0, ans=0.0 2024-10-08 11:50:26,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.538e+02 4.215e+02 4.561e+02 5.162e+02 8.052e+02, threshold=9.122e+02, percent-clipped=0.0 2024-10-08 11:50:27,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=258912.0, ans=0.0 2024-10-08 11:50:48,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=259032.0, ans=0.125 2024-10-08 11:51:01,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=259152.0, ans=0.0 2024-10-08 11:51:11,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=259152.0, ans=0.1 2024-10-08 11:51:25,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=259272.0, ans=0.025 2024-10-08 11:51:25,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=259272.0, ans=0.125 2024-10-08 11:51:35,760 INFO [train.py:1136] (0/2) Epoch 27, batch 400, loss[loss=0.1981, simple_loss=0.3011, pruned_loss=0.04755, over 84437.00 frames. ], tot_loss[loss=0.1868, simple_loss=0.2855, pruned_loss=0.04404, over 14803004.42 frames. ], batch size: 958, lr: 9.41e-03, grad_scale: 32.0 2024-10-08 11:51:39,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=259392.0, ans=0.025 2024-10-08 11:52:49,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=259752.0, ans=0.125 2024-10-08 11:52:58,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=259872.0, ans=0.0 2024-10-08 11:53:10,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=259992.0, ans=0.0 2024-10-08 11:53:11,944 INFO [train.py:1136] (0/2) Epoch 27, batch 450, loss[loss=0.1798, simple_loss=0.2693, pruned_loss=0.04515, over 85945.00 frames. ], tot_loss[loss=0.1864, simple_loss=0.2851, pruned_loss=0.04389, over 15326101.25 frames. ], batch size: 180, lr: 9.40e-03, grad_scale: 16.0 2024-10-08 11:53:40,537 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.615e+02 4.537e+02 5.087e+02 5.880e+02 8.276e+02, threshold=1.017e+03, percent-clipped=0.0 2024-10-08 11:53:44,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=260112.0, ans=0.125 2024-10-08 11:54:01,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=260232.0, ans=0.09899494936611666 2024-10-08 11:54:09,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=260352.0, ans=0.125 2024-10-08 11:54:45,709 INFO [train.py:1136] (0/2) Epoch 27, batch 500, loss[loss=0.1926, simple_loss=0.2948, pruned_loss=0.04517, over 85177.00 frames. ], tot_loss[loss=0.1865, simple_loss=0.2852, pruned_loss=0.04386, over 15691373.27 frames. ], batch size: 866, lr: 9.39e-03, grad_scale: 16.0 2024-10-08 11:54:53,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=260592.0, ans=0.04949747468305833 2024-10-08 11:54:53,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=260592.0, ans=0.025 2024-10-08 11:54:55,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=260592.0, ans=0.125 2024-10-08 11:55:09,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=260712.0, ans=0.1 2024-10-08 11:55:26,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=260832.0, ans=0.0 2024-10-08 11:56:00,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2024-10-08 11:56:06,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=261072.0, ans=0.2 2024-10-08 11:56:18,617 INFO [train.py:1136] (0/2) Epoch 27, batch 550, loss[loss=0.1764, simple_loss=0.2768, pruned_loss=0.038, over 87395.00 frames. ], tot_loss[loss=0.1853, simple_loss=0.284, pruned_loss=0.04325, over 16047640.92 frames. ], batch size: 372, lr: 9.38e-03, grad_scale: 16.0 2024-10-08 11:56:28,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=261192.0, ans=0.05 2024-10-08 11:56:30,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=261192.0, ans=0.025 2024-10-08 11:56:49,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.605e+02 4.278e+02 4.636e+02 5.434e+02 6.943e+02, threshold=9.271e+02, percent-clipped=0.0 2024-10-08 11:57:45,445 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261672.0, ans=0.1 2024-10-08 11:57:52,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=261672.0, ans=0.125 2024-10-08 11:57:55,174 INFO [train.py:1136] (0/2) Epoch 27, batch 600, loss[loss=0.1794, simple_loss=0.2767, pruned_loss=0.04108, over 87083.00 frames. ], tot_loss[loss=0.1857, simple_loss=0.2844, pruned_loss=0.04351, over 16280520.75 frames. ], batch size: 350, lr: 9.37e-03, grad_scale: 16.0 2024-10-08 11:57:55,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=261792.0, ans=0.125 2024-10-08 11:58:07,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.43 vs. limit=10.0 2024-10-08 11:58:23,801 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 11:58:44,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262032.0, ans=0.0 2024-10-08 11:58:51,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=262152.0, ans=0.025 2024-10-08 11:58:54,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262152.0, ans=0.1 2024-10-08 11:59:14,283 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=262272.0, ans=0.125 2024-10-08 11:59:31,341 INFO [train.py:1136] (0/2) Epoch 27, batch 650, loss[loss=0.1689, simple_loss=0.275, pruned_loss=0.03137, over 87116.00 frames. ], tot_loss[loss=0.1854, simple_loss=0.2843, pruned_loss=0.04323, over 16447515.45 frames. ], batch size: 517, lr: 9.36e-03, grad_scale: 16.0 2024-10-08 11:59:35,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262392.0, ans=0.1 2024-10-08 11:59:35,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=15.0 2024-10-08 11:59:52,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=262512.0, ans=0.2 2024-10-08 11:59:58,897 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.604e+02 4.129e+02 4.787e+02 5.579e+02 7.596e+02, threshold=9.573e+02, percent-clipped=0.0 2024-10-08 12:00:16,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=262632.0, ans=0.125 2024-10-08 12:00:31,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2024-10-08 12:00:39,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=262872.0, ans=10.0 2024-10-08 12:00:56,063 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=12.0 2024-10-08 12:00:56,727 INFO [train.py:1136] (0/2) Epoch 27, batch 700, loss[loss=0.1788, simple_loss=0.2785, pruned_loss=0.03953, over 87107.00 frames. ], tot_loss[loss=0.1853, simple_loss=0.2845, pruned_loss=0.04307, over 16601507.67 frames. ], batch size: 350, lr: 9.35e-03, grad_scale: 16.0 2024-10-08 12:01:16,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=263112.0, ans=0.04949747468305833 2024-10-08 12:01:21,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=263112.0, ans=0.0 2024-10-08 12:01:38,775 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:02:06,526 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.91 vs. limit=12.0 2024-10-08 12:02:10,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=263472.0, ans=0.0 2024-10-08 12:02:13,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=263472.0, ans=0.025 2024-10-08 12:02:16,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=263472.0, ans=0.0 2024-10-08 12:02:19,715 INFO [train.py:1136] (0/2) Epoch 27, batch 750, loss[loss=0.1716, simple_loss=0.2719, pruned_loss=0.03561, over 87081.00 frames. ], tot_loss[loss=0.1849, simple_loss=0.2842, pruned_loss=0.04285, over 16711113.96 frames. ], batch size: 264, lr: 9.34e-03, grad_scale: 16.0 2024-10-08 12:02:20,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=263592.0, ans=0.125 2024-10-08 12:02:34,985 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.53 vs. limit=10.0 2024-10-08 12:02:42,236 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.23 vs. limit=15.0 2024-10-08 12:02:45,895 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.540e+02 4.287e+02 5.108e+02 5.958e+02 1.235e+03, threshold=1.022e+03, percent-clipped=1.0 2024-10-08 12:02:48,554 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.18 vs. limit=15.0 2024-10-08 12:02:55,662 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=263832.0, ans=0.07 2024-10-08 12:03:05,372 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=263832.0, ans=0.04949747468305833 2024-10-08 12:03:15,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=263952.0, ans=0.125 2024-10-08 12:03:19,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.66 vs. limit=6.0 2024-10-08 12:03:44,168 INFO [train.py:1136] (0/2) Epoch 27, batch 800, loss[loss=0.1741, simple_loss=0.2734, pruned_loss=0.03743, over 87391.00 frames. ], tot_loss[loss=0.185, simple_loss=0.2841, pruned_loss=0.04295, over 16742636.64 frames. ], batch size: 393, lr: 9.33e-03, grad_scale: 32.0 2024-10-08 12:03:44,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=264192.0, ans=0.125 2024-10-08 12:04:04,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264312.0, ans=0.1 2024-10-08 12:04:09,730 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-27.pt 2024-10-08 12:05:01,593 INFO [train.py:1136] (0/2) Epoch 28, batch 0, loss[loss=0.1823, simple_loss=0.2801, pruned_loss=0.04229, over 87244.00 frames. ], tot_loss[loss=0.1823, simple_loss=0.2801, pruned_loss=0.04229, over 87244.00 frames. ], batch size: 330, lr: 9.16e-03, grad_scale: 32.0 2024-10-08 12:05:01,594 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 12:05:12,630 INFO [train.py:1168] (0/2) Epoch 28, validation: loss=0.1673, simple_loss=0.2786, pruned_loss=0.02803, over 1382211.00 frames. 2024-10-08 12:05:12,630 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 12:05:55,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=264624.0, ans=0.0 2024-10-08 12:06:11,259 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-10-08 12:06:25,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=264864.0, ans=0.0 2024-10-08 12:06:45,410 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.535e+02 4.319e+02 4.955e+02 5.809e+02 9.446e+02, threshold=9.909e+02, percent-clipped=0.0 2024-10-08 12:06:47,154 INFO [train.py:1136] (0/2) Epoch 28, batch 50, loss[loss=0.1749, simple_loss=0.2661, pruned_loss=0.04189, over 86215.00 frames. ], tot_loss[loss=0.1863, simple_loss=0.2851, pruned_loss=0.04375, over 3873844.00 frames. ], batch size: 197, lr: 9.15e-03, grad_scale: 32.0 2024-10-08 12:06:47,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=264984.0, ans=0.0 2024-10-08 12:06:54,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=264984.0, ans=0.125 2024-10-08 12:07:05,288 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=22.5 2024-10-08 12:07:23,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=265224.0, ans=0.0 2024-10-08 12:07:48,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=265344.0, ans=0.1 2024-10-08 12:07:56,180 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2024-10-08 12:08:12,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=265464.0, ans=0.0 2024-10-08 12:08:12,720 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-10-08 12:08:13,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=265464.0, ans=0.0 2024-10-08 12:08:15,694 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=265464.0, ans=0.125 2024-10-08 12:08:22,147 INFO [train.py:1136] (0/2) Epoch 28, batch 100, loss[loss=0.1665, simple_loss=0.2711, pruned_loss=0.03098, over 87307.00 frames. ], tot_loss[loss=0.1861, simple_loss=0.2847, pruned_loss=0.04368, over 6799621.68 frames. ], batch size: 439, lr: 9.14e-03, grad_scale: 32.0 2024-10-08 12:09:14,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=265824.0, ans=0.125 2024-10-08 12:09:17,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=2.16 vs. limit=12.0 2024-10-08 12:09:26,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=265944.0, ans=0.125 2024-10-08 12:09:55,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=266064.0, ans=0.125 2024-10-08 12:09:56,882 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.672e+02 4.274e+02 4.945e+02 5.753e+02 8.446e+02, threshold=9.890e+02, percent-clipped=0.0 2024-10-08 12:09:58,565 INFO [train.py:1136] (0/2) Epoch 28, batch 150, loss[loss=0.1762, simple_loss=0.2808, pruned_loss=0.03577, over 87271.00 frames. ], tot_loss[loss=0.1862, simple_loss=0.2852, pruned_loss=0.04354, over 9059553.54 frames. ], batch size: 464, lr: 9.13e-03, grad_scale: 32.0 2024-10-08 12:10:13,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=266184.0, ans=0.125 2024-10-08 12:10:47,804 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:11:10,348 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-10-08 12:11:13,103 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=266664.0, ans=0.125 2024-10-08 12:11:32,366 INFO [train.py:1136] (0/2) Epoch 28, batch 200, loss[loss=0.1734, simple_loss=0.271, pruned_loss=0.03786, over 87153.00 frames. ], tot_loss[loss=0.1851, simple_loss=0.2839, pruned_loss=0.04313, over 10877061.68 frames. ], batch size: 350, lr: 9.12e-03, grad_scale: 32.0 2024-10-08 12:11:34,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=266784.0, ans=0.125 2024-10-08 12:11:37,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=266784.0, ans=0.05 2024-10-08 12:11:40,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=15.0 2024-10-08 12:12:59,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=267264.0, ans=0.125 2024-10-08 12:13:05,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=267264.0, ans=0.04949747468305833 2024-10-08 12:13:08,763 INFO [train.py:1136] (0/2) Epoch 28, batch 250, loss[loss=0.1787, simple_loss=0.2776, pruned_loss=0.03995, over 87240.00 frames. ], tot_loss[loss=0.185, simple_loss=0.2837, pruned_loss=0.04316, over 12244056.00 frames. ], batch size: 264, lr: 9.11e-03, grad_scale: 8.0 2024-10-08 12:13:10,345 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.669e+02 4.114e+02 4.512e+02 5.301e+02 8.173e+02, threshold=9.024e+02, percent-clipped=0.0 2024-10-08 12:13:44,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=267624.0, ans=0.2 2024-10-08 12:14:04,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=267744.0, ans=0.125 2024-10-08 12:14:04,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=267744.0, ans=0.2 2024-10-08 12:14:38,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=267864.0, ans=0.0 2024-10-08 12:14:41,930 INFO [train.py:1136] (0/2) Epoch 28, batch 300, loss[loss=0.1677, simple_loss=0.2646, pruned_loss=0.03541, over 86805.00 frames. ], tot_loss[loss=0.1848, simple_loss=0.2834, pruned_loss=0.04305, over 13294030.19 frames. ], batch size: 246, lr: 9.10e-03, grad_scale: 8.0 2024-10-08 12:14:47,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=267984.0, ans=0.125 2024-10-08 12:14:59,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=268104.0, ans=0.125 2024-10-08 12:15:34,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=268224.0, ans=0.0 2024-10-08 12:15:48,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=268344.0, ans=0.125 2024-10-08 12:16:08,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=268464.0, ans=0.05 2024-10-08 12:16:13,947 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-10-08 12:16:18,169 INFO [train.py:1136] (0/2) Epoch 28, batch 350, loss[loss=0.1954, simple_loss=0.2979, pruned_loss=0.04642, over 85339.00 frames. ], tot_loss[loss=0.1848, simple_loss=0.2838, pruned_loss=0.04294, over 14121237.13 frames. ], batch size: 866, lr: 9.09e-03, grad_scale: 8.0 2024-10-08 12:16:19,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-10-08 12:16:19,775 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.615e+02 4.088e+02 4.502e+02 5.315e+02 1.746e+03, threshold=9.003e+02, percent-clipped=3.0 2024-10-08 12:16:30,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=268584.0, ans=0.2 2024-10-08 12:16:39,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=268704.0, ans=0.125 2024-10-08 12:17:10,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=268824.0, ans=0.125 2024-10-08 12:17:20,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=15.0 2024-10-08 12:17:22,128 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=268944.0, ans=0.125 2024-10-08 12:17:44,748 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2024-10-08 12:17:45,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=269064.0, ans=0.0 2024-10-08 12:17:54,511 INFO [train.py:1136] (0/2) Epoch 28, batch 400, loss[loss=0.1697, simple_loss=0.2714, pruned_loss=0.03401, over 87359.00 frames. ], tot_loss[loss=0.1839, simple_loss=0.2828, pruned_loss=0.04247, over 14778035.82 frames. ], batch size: 372, lr: 9.09e-03, grad_scale: 16.0 2024-10-08 12:18:06,797 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=269184.0, ans=0.125 2024-10-08 12:18:22,465 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-10-08 12:18:30,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269424.0, ans=0.1 2024-10-08 12:18:37,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=269424.0, ans=0.125 2024-10-08 12:19:04,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=269544.0, ans=0.125 2024-10-08 12:19:12,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=15.0 2024-10-08 12:19:12,489 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=15.0 2024-10-08 12:19:13,364 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=269664.0, ans=0.2 2024-10-08 12:19:28,516 INFO [train.py:1136] (0/2) Epoch 28, batch 450, loss[loss=0.1746, simple_loss=0.2804, pruned_loss=0.03444, over 87402.00 frames. ], tot_loss[loss=0.184, simple_loss=0.2831, pruned_loss=0.04243, over 15292349.35 frames. ], batch size: 490, lr: 9.08e-03, grad_scale: 16.0 2024-10-08 12:19:30,113 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.583e+02 4.130e+02 4.629e+02 5.203e+02 7.917e+02, threshold=9.258e+02, percent-clipped=0.0 2024-10-08 12:19:36,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=269784.0, ans=0.125 2024-10-08 12:19:40,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=15.0 2024-10-08 12:19:55,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=269904.0, ans=0.125 2024-10-08 12:20:02,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=269904.0, ans=0.1 2024-10-08 12:20:30,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=270144.0, ans=0.0 2024-10-08 12:20:32,405 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=270144.0, ans=0.125 2024-10-08 12:20:53,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=270264.0, ans=0.125 2024-10-08 12:21:03,566 INFO [train.py:1136] (0/2) Epoch 28, batch 500, loss[loss=0.2043, simple_loss=0.3032, pruned_loss=0.05267, over 81741.00 frames. ], tot_loss[loss=0.1846, simple_loss=0.2836, pruned_loss=0.04281, over 15701439.09 frames. ], batch size: 1245, lr: 9.07e-03, grad_scale: 16.0 2024-10-08 12:21:12,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=270384.0, ans=0.2 2024-10-08 12:22:15,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=270744.0, ans=0.0 2024-10-08 12:22:36,138 INFO [train.py:1136] (0/2) Epoch 28, batch 550, loss[loss=0.1972, simple_loss=0.295, pruned_loss=0.04971, over 86791.00 frames. ], tot_loss[loss=0.1841, simple_loss=0.2832, pruned_loss=0.04253, over 16026487.54 frames. ], batch size: 547, lr: 9.06e-03, grad_scale: 16.0 2024-10-08 12:22:40,096 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.590e+02 4.163e+02 4.513e+02 5.084e+02 7.851e+02, threshold=9.026e+02, percent-clipped=0.0 2024-10-08 12:23:16,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=271224.0, ans=0.0 2024-10-08 12:23:28,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=271224.0, ans=0.0 2024-10-08 12:23:40,866 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=271344.0, ans=0.125 2024-10-08 12:23:40,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=271344.0, ans=0.0 2024-10-08 12:23:54,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271464.0, ans=0.1 2024-10-08 12:24:06,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=271464.0, ans=0.125 2024-10-08 12:24:11,987 INFO [train.py:1136] (0/2) Epoch 28, batch 600, loss[loss=0.1833, simple_loss=0.2779, pruned_loss=0.04433, over 87174.00 frames. ], tot_loss[loss=0.1839, simple_loss=0.2829, pruned_loss=0.04246, over 16274090.18 frames. ], batch size: 296, lr: 9.05e-03, grad_scale: 16.0 2024-10-08 12:24:18,240 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2024-10-08 12:24:24,314 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:24:30,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=271704.0, ans=0.2 2024-10-08 12:24:51,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=271824.0, ans=0.125 2024-10-08 12:25:07,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=271944.0, ans=0.2 2024-10-08 12:25:09,360 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2024-10-08 12:25:12,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=271944.0, ans=0.125 2024-10-08 12:25:24,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=271944.0, ans=0.0 2024-10-08 12:25:45,098 INFO [train.py:1136] (0/2) Epoch 28, batch 650, loss[loss=0.2253, simple_loss=0.3192, pruned_loss=0.06569, over 78567.00 frames. ], tot_loss[loss=0.184, simple_loss=0.2832, pruned_loss=0.04242, over 16458179.11 frames. ], batch size: 1493, lr: 9.04e-03, grad_scale: 16.0 2024-10-08 12:25:46,759 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.566e+02 4.357e+02 4.888e+02 5.403e+02 7.662e+02, threshold=9.776e+02, percent-clipped=0.0 2024-10-08 12:25:58,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=272184.0, ans=0.2 2024-10-08 12:26:17,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=15.0 2024-10-08 12:26:19,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=272304.0, ans=0.125 2024-10-08 12:26:39,800 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2024-10-08 12:26:44,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272544.0, ans=0.125 2024-10-08 12:26:56,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2024-10-08 12:27:14,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2024-10-08 12:27:15,408 INFO [train.py:1136] (0/2) Epoch 28, batch 700, loss[loss=0.215, simple_loss=0.3038, pruned_loss=0.06311, over 69699.00 frames. ], tot_loss[loss=0.1849, simple_loss=0.2839, pruned_loss=0.04289, over 16551470.21 frames. ], batch size: 1960, lr: 9.03e-03, grad_scale: 16.0 2024-10-08 12:27:15,794 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:28:07,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=273144.0, ans=0.0 2024-10-08 12:28:27,166 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.71 vs. limit=6.0 2024-10-08 12:28:28,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=273264.0, ans=0.015 2024-10-08 12:28:40,294 INFO [train.py:1136] (0/2) Epoch 28, batch 750, loss[loss=0.165, simple_loss=0.2624, pruned_loss=0.03382, over 86709.00 frames. ], tot_loss[loss=0.1846, simple_loss=0.2835, pruned_loss=0.04279, over 16649381.24 frames. ], batch size: 246, lr: 9.02e-03, grad_scale: 8.0 2024-10-08 12:28:43,435 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.650e+02 4.245e+02 4.866e+02 5.718e+02 9.379e+02, threshold=9.732e+02, percent-clipped=0.0 2024-10-08 12:28:43,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=273384.0, ans=0.125 2024-10-08 12:29:20,038 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=273624.0, ans=0.0 2024-10-08 12:30:02,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=273984.0, ans=0.1 2024-10-08 12:30:03,991 INFO [train.py:1136] (0/2) Epoch 28, batch 800, loss[loss=0.2278, simple_loss=0.3201, pruned_loss=0.06771, over 78556.00 frames. ], tot_loss[loss=0.1852, simple_loss=0.2838, pruned_loss=0.04327, over 16690451.00 frames. ], batch size: 1493, lr: 9.01e-03, grad_scale: 16.0 2024-10-08 12:30:04,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=273984.0, ans=0.0 2024-10-08 12:30:04,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=273984.0, ans=0.2 2024-10-08 12:30:11,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=15.0 2024-10-08 12:30:12,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=273984.0, ans=10.0 2024-10-08 12:30:13,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=273984.0, ans=0.09899494936611666 2024-10-08 12:30:22,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274104.0, ans=0.1 2024-10-08 12:30:29,696 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-28.pt 2024-10-08 12:31:20,832 INFO [train.py:1136] (0/2) Epoch 29, batch 0, loss[loss=0.2005, simple_loss=0.3024, pruned_loss=0.04933, over 83240.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.3024, pruned_loss=0.04933, over 83240.00 frames. ], batch size: 1077, lr: 8.85e-03, grad_scale: 32.0 2024-10-08 12:31:20,833 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 12:31:31,948 INFO [train.py:1168] (0/2) Epoch 29, validation: loss=0.1678, simple_loss=0.2795, pruned_loss=0.02803, over 1382211.00 frames. 2024-10-08 12:31:31,948 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 12:31:53,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=274296.0, ans=0.125 2024-10-08 12:32:20,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.47 vs. limit=15.0 2024-10-08 12:32:38,737 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.756e+02 4.232e+02 4.789e+02 5.478e+02 9.311e+02, threshold=9.579e+02, percent-clipped=0.0 2024-10-08 12:33:05,276 INFO [train.py:1136] (0/2) Epoch 29, batch 50, loss[loss=0.1763, simple_loss=0.27, pruned_loss=0.04128, over 87304.00 frames. ], tot_loss[loss=0.1821, simple_loss=0.2807, pruned_loss=0.0417, over 3875448.44 frames. ], batch size: 280, lr: 8.84e-03, grad_scale: 32.0 2024-10-08 12:33:12,467 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=274776.0, ans=0.125 2024-10-08 12:33:31,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=274896.0, ans=0.125 2024-10-08 12:33:39,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=274896.0, ans=10.0 2024-10-08 12:33:44,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=275016.0, ans=0.0 2024-10-08 12:33:49,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=275016.0, ans=0.125 2024-10-08 12:33:56,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=275016.0, ans=0.0 2024-10-08 12:34:25,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=275256.0, ans=0.1 2024-10-08 12:34:42,259 INFO [train.py:1136] (0/2) Epoch 29, batch 100, loss[loss=0.1869, simple_loss=0.2848, pruned_loss=0.04451, over 87044.00 frames. ], tot_loss[loss=0.1831, simple_loss=0.282, pruned_loss=0.04205, over 6819691.11 frames. ], batch size: 548, lr: 8.83e-03, grad_scale: 32.0 2024-10-08 12:34:46,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2024-10-08 12:35:13,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=275496.0, ans=0.125 2024-10-08 12:35:49,147 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.444e+02 3.973e+02 4.438e+02 4.877e+02 6.261e+02, threshold=8.875e+02, percent-clipped=0.0 2024-10-08 12:35:52,110 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2024-10-08 12:35:54,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275856.0, ans=0.1 2024-10-08 12:36:15,796 INFO [train.py:1136] (0/2) Epoch 29, batch 150, loss[loss=0.1718, simple_loss=0.2764, pruned_loss=0.03356, over 87245.00 frames. ], tot_loss[loss=0.1831, simple_loss=0.282, pruned_loss=0.0421, over 9113107.87 frames. ], batch size: 517, lr: 8.82e-03, grad_scale: 32.0 2024-10-08 12:36:48,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2024-10-08 12:37:49,866 INFO [train.py:1136] (0/2) Epoch 29, batch 200, loss[loss=0.1845, simple_loss=0.2812, pruned_loss=0.04393, over 87403.00 frames. ], tot_loss[loss=0.1838, simple_loss=0.2829, pruned_loss=0.04231, over 10875518.55 frames. ], batch size: 313, lr: 8.82e-03, grad_scale: 16.0 2024-10-08 12:38:00,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=276576.0, ans=0.07 2024-10-08 12:38:33,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=276816.0, ans=0.2 2024-10-08 12:38:45,576 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:38:45,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.57 vs. limit=10.0 2024-10-08 12:39:01,052 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.443e+02 4.036e+02 4.585e+02 5.370e+02 8.658e+02, threshold=9.170e+02, percent-clipped=0.0 2024-10-08 12:39:07,781 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:39:11,952 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=22.5 2024-10-08 12:39:26,309 INFO [train.py:1136] (0/2) Epoch 29, batch 250, loss[loss=0.1721, simple_loss=0.2629, pruned_loss=0.04068, over 85781.00 frames. ], tot_loss[loss=0.1839, simple_loss=0.2829, pruned_loss=0.0424, over 12270375.16 frames. ], batch size: 180, lr: 8.81e-03, grad_scale: 16.0 2024-10-08 12:39:50,014 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=277296.0, ans=0.2 2024-10-08 12:39:50,039 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=277296.0, ans=0.125 2024-10-08 12:39:53,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=277296.0, ans=0.125 2024-10-08 12:40:01,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2024-10-08 12:40:04,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=277416.0, ans=0.1 2024-10-08 12:40:10,075 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=277416.0, ans=0.125 2024-10-08 12:40:15,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.12 vs. limit=15.0 2024-10-08 12:40:25,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=277536.0, ans=0.0 2024-10-08 12:41:03,366 INFO [train.py:1136] (0/2) Epoch 29, batch 300, loss[loss=0.1767, simple_loss=0.2742, pruned_loss=0.03959, over 87018.00 frames. ], tot_loss[loss=0.1848, simple_loss=0.2837, pruned_loss=0.04296, over 13273790.11 frames. ], batch size: 350, lr: 8.80e-03, grad_scale: 16.0 2024-10-08 12:41:12,710 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:41:14,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.70 vs. limit=22.5 2024-10-08 12:41:20,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=277776.0, ans=0.125 2024-10-08 12:41:22,073 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=277896.0, ans=0.0 2024-10-08 12:41:48,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278016.0, ans=0.1 2024-10-08 12:42:14,436 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.653e+02 4.306e+02 4.971e+02 5.705e+02 1.902e+03, threshold=9.943e+02, percent-clipped=1.0 2024-10-08 12:42:28,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278256.0, ans=0.1 2024-10-08 12:42:32,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=278256.0, ans=0.0 2024-10-08 12:42:34,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=22.5 2024-10-08 12:42:38,990 INFO [train.py:1136] (0/2) Epoch 29, batch 350, loss[loss=0.1686, simple_loss=0.2734, pruned_loss=0.03191, over 87237.00 frames. ], tot_loss[loss=0.1841, simple_loss=0.2832, pruned_loss=0.04255, over 14097009.23 frames. ], batch size: 415, lr: 8.79e-03, grad_scale: 16.0 2024-10-08 12:42:42,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=278376.0, ans=0.125 2024-10-08 12:42:42,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=278376.0, ans=0.125 2024-10-08 12:42:56,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278496.0, ans=0.1 2024-10-08 12:43:01,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=278496.0, ans=0.2 2024-10-08 12:43:03,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=278496.0, ans=0.09899494936611666 2024-10-08 12:43:29,030 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=278616.0, ans=0.0 2024-10-08 12:43:52,004 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=278856.0, ans=0.125 2024-10-08 12:44:15,864 INFO [train.py:1136] (0/2) Epoch 29, batch 400, loss[loss=0.2078, simple_loss=0.2996, pruned_loss=0.058, over 69479.00 frames. ], tot_loss[loss=0.1837, simple_loss=0.2827, pruned_loss=0.04231, over 14764738.23 frames. ], batch size: 1960, lr: 8.78e-03, grad_scale: 32.0 2024-10-08 12:44:16,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=278976.0, ans=0.125 2024-10-08 12:44:23,611 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-10-08 12:44:25,430 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=22.5 2024-10-08 12:45:23,395 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.631e+02 4.203e+02 4.507e+02 5.138e+02 7.712e+02, threshold=9.014e+02, percent-clipped=0.0 2024-10-08 12:45:48,523 INFO [train.py:1136] (0/2) Epoch 29, batch 450, loss[loss=0.1877, simple_loss=0.2884, pruned_loss=0.0435, over 86416.00 frames. ], tot_loss[loss=0.1835, simple_loss=0.2827, pruned_loss=0.04218, over 15287674.81 frames. ], batch size: 620, lr: 8.77e-03, grad_scale: 32.0 2024-10-08 12:45:58,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=279576.0, ans=0.0 2024-10-08 12:46:23,709 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=279816.0, ans=0.125 2024-10-08 12:46:31,411 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:46:41,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=279816.0, ans=0.2 2024-10-08 12:46:49,763 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=22.5 2024-10-08 12:47:16,380 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=280056.0, ans=0.125 2024-10-08 12:47:23,594 INFO [train.py:1136] (0/2) Epoch 29, batch 500, loss[loss=0.1694, simple_loss=0.2618, pruned_loss=0.0385, over 87299.00 frames. ], tot_loss[loss=0.1836, simple_loss=0.283, pruned_loss=0.04212, over 15713689.32 frames. ], batch size: 280, lr: 8.76e-03, grad_scale: 32.0 2024-10-08 12:48:19,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=280536.0, ans=0.1 2024-10-08 12:48:32,193 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.464e+02 4.242e+02 4.793e+02 5.394e+02 9.265e+02, threshold=9.587e+02, percent-clipped=1.0 2024-10-08 12:48:49,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2024-10-08 12:48:57,316 INFO [train.py:1136] (0/2) Epoch 29, batch 550, loss[loss=0.1711, simple_loss=0.27, pruned_loss=0.03614, over 87133.00 frames. ], tot_loss[loss=0.183, simple_loss=0.2825, pruned_loss=0.04173, over 16039789.34 frames. ], batch size: 264, lr: 8.76e-03, grad_scale: 32.0 2024-10-08 12:48:59,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=280776.0, ans=0.025 2024-10-08 12:49:07,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=280776.0, ans=0.09899494936611666 2024-10-08 12:49:37,310 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=281016.0, ans=0.2 2024-10-08 12:49:57,866 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2024-10-08 12:50:33,724 INFO [train.py:1136] (0/2) Epoch 29, batch 600, loss[loss=0.1853, simple_loss=0.2853, pruned_loss=0.04262, over 86995.00 frames. ], tot_loss[loss=0.1832, simple_loss=0.2827, pruned_loss=0.04186, over 16262985.03 frames. ], batch size: 583, lr: 8.75e-03, grad_scale: 8.0 2024-10-08 12:50:39,649 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:50:50,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=281376.0, ans=0.125 2024-10-08 12:50:50,727 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=281376.0, ans=0.025 2024-10-08 12:50:59,336 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:51:06,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=281496.0, ans=0.2 2024-10-08 12:51:48,902 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.581e+02 4.119e+02 4.493e+02 4.965e+02 7.138e+02, threshold=8.986e+02, percent-clipped=0.0 2024-10-08 12:51:53,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=281856.0, ans=0.125 2024-10-08 12:51:56,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=281856.0, ans=0.125 2024-10-08 12:52:10,607 INFO [train.py:1136] (0/2) Epoch 29, batch 650, loss[loss=0.2089, simple_loss=0.3005, pruned_loss=0.05867, over 69270.00 frames. ], tot_loss[loss=0.1833, simple_loss=0.2828, pruned_loss=0.04188, over 16428806.79 frames. ], batch size: 1960, lr: 8.74e-03, grad_scale: 8.0 2024-10-08 12:52:11,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281976.0, ans=0.1 2024-10-08 12:52:13,389 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2024-10-08 12:52:24,173 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=281976.0, ans=0.125 2024-10-08 12:52:32,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=282096.0, ans=0.125 2024-10-08 12:52:33,157 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.13 vs. limit=15.0 2024-10-08 12:53:09,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=282336.0, ans=0.0 2024-10-08 12:53:18,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=22.5 2024-10-08 12:53:31,054 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.00 vs. limit=15.0 2024-10-08 12:53:34,883 INFO [train.py:1136] (0/2) Epoch 29, batch 700, loss[loss=0.1801, simple_loss=0.2783, pruned_loss=0.04095, over 87119.00 frames. ], tot_loss[loss=0.1832, simple_loss=0.2827, pruned_loss=0.04189, over 16576510.50 frames. ], batch size: 330, lr: 8.73e-03, grad_scale: 8.0 2024-10-08 12:54:23,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=282936.0, ans=0.125 2024-10-08 12:54:34,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=282936.0, ans=0.125 2024-10-08 12:54:37,568 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=282936.0, ans=0.125 2024-10-08 12:54:38,802 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.468e+02 4.146e+02 4.676e+02 5.510e+02 9.662e+02, threshold=9.351e+02, percent-clipped=1.0 2024-10-08 12:54:58,696 INFO [train.py:1136] (0/2) Epoch 29, batch 750, loss[loss=0.1947, simple_loss=0.2977, pruned_loss=0.04585, over 85611.00 frames. ], tot_loss[loss=0.1831, simple_loss=0.2825, pruned_loss=0.04178, over 16687770.90 frames. ], batch size: 787, lr: 8.72e-03, grad_scale: 8.0 2024-10-08 12:55:03,016 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2024-10-08 12:55:27,883 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 12:55:52,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.53 vs. limit=15.0 2024-10-08 12:56:05,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=283656.0, ans=0.125 2024-10-08 12:56:20,182 INFO [train.py:1136] (0/2) Epoch 29, batch 800, loss[loss=0.1775, simple_loss=0.2763, pruned_loss=0.03938, over 87078.00 frames. ], tot_loss[loss=0.1833, simple_loss=0.2828, pruned_loss=0.04185, over 16784583.56 frames. ], batch size: 330, lr: 8.71e-03, grad_scale: 16.0 2024-10-08 12:56:26,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=283776.0, ans=0.2 2024-10-08 12:56:33,944 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.27 vs. limit=15.0 2024-10-08 12:56:41,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=283896.0, ans=0.125 2024-10-08 12:56:45,806 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-29.pt 2024-10-08 12:57:41,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=283968.0, ans=0.0 2024-10-08 12:57:42,333 INFO [train.py:1136] (0/2) Epoch 30, batch 0, loss[loss=0.1736, simple_loss=0.2755, pruned_loss=0.03582, over 87362.00 frames. ], tot_loss[loss=0.1736, simple_loss=0.2755, pruned_loss=0.03582, over 87362.00 frames. ], batch size: 393, lr: 8.56e-03, grad_scale: 32.0 2024-10-08 12:57:42,334 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 12:57:46,727 INFO [zipformer.py:1883] (0/2) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.7570, 1.9344, 1.7646, 1.7203, 1.9952, 1.9106, 1.8633, 1.4598], device='cuda:0') 2024-10-08 12:57:53,440 INFO [train.py:1168] (0/2) Epoch 30, validation: loss=0.1668, simple_loss=0.2777, pruned_loss=0.02793, over 1382211.00 frames. 2024-10-08 12:57:53,440 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 12:58:26,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=284088.0, ans=0.125 2024-10-08 12:58:26,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=284088.0, ans=0.125 2024-10-08 12:58:30,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=284208.0, ans=0.125 2024-10-08 12:58:35,533 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.615e+02 4.356e+02 5.252e+02 6.213e+02 9.468e+02, threshold=1.050e+03, percent-clipped=1.0 2024-10-08 12:59:15,870 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-10-08 12:59:24,398 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=284448.0, ans=0.125 2024-10-08 12:59:27,204 INFO [train.py:1136] (0/2) Epoch 30, batch 50, loss[loss=0.1656, simple_loss=0.268, pruned_loss=0.03157, over 87265.00 frames. ], tot_loss[loss=0.1809, simple_loss=0.2801, pruned_loss=0.04087, over 3871179.55 frames. ], batch size: 372, lr: 8.55e-03, grad_scale: 32.0 2024-10-08 12:59:28,068 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.31 vs. limit=15.0 2024-10-08 12:59:41,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=284568.0, ans=0.125 2024-10-08 12:59:53,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=15.0 2024-10-08 13:00:10,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=284808.0, ans=0.125 2024-10-08 13:00:27,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284928.0, ans=0.1 2024-10-08 13:00:37,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=284928.0, ans=0.125 2024-10-08 13:00:57,181 INFO [train.py:1136] (0/2) Epoch 30, batch 100, loss[loss=0.1875, simple_loss=0.2914, pruned_loss=0.0418, over 84581.00 frames. ], tot_loss[loss=0.181, simple_loss=0.281, pruned_loss=0.04047, over 6839323.95 frames. ], batch size: 958, lr: 8.55e-03, grad_scale: 32.0 2024-10-08 13:01:01,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-10-08 13:01:16,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=285168.0, ans=0.125 2024-10-08 13:01:41,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.353e+02 4.118e+02 4.437e+02 5.013e+02 6.460e+02, threshold=8.875e+02, percent-clipped=0.0 2024-10-08 13:02:08,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=285528.0, ans=0.0 2024-10-08 13:02:13,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=285528.0, ans=0.0 2024-10-08 13:02:20,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=285648.0, ans=0.125 2024-10-08 13:02:33,911 INFO [train.py:1136] (0/2) Epoch 30, batch 150, loss[loss=0.1951, simple_loss=0.2979, pruned_loss=0.04618, over 83372.00 frames. ], tot_loss[loss=0.1826, simple_loss=0.2821, pruned_loss=0.04158, over 9083994.85 frames. ], batch size: 1077, lr: 8.54e-03, grad_scale: 32.0 2024-10-08 13:02:41,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=285768.0, ans=0.0 2024-10-08 13:03:11,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=286008.0, ans=0.125 2024-10-08 13:03:47,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=286248.0, ans=0.0 2024-10-08 13:03:49,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=286248.0, ans=0.2 2024-10-08 13:03:55,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=286248.0, ans=0.0 2024-10-08 13:03:58,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=286248.0, ans=0.95 2024-10-08 13:04:07,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286368.0, ans=0.1 2024-10-08 13:04:08,441 INFO [train.py:1136] (0/2) Epoch 30, batch 200, loss[loss=0.1694, simple_loss=0.2761, pruned_loss=0.03131, over 87241.00 frames. ], tot_loss[loss=0.1822, simple_loss=0.2816, pruned_loss=0.04137, over 10866087.55 frames. ], batch size: 439, lr: 8.53e-03, grad_scale: 32.0 2024-10-08 13:04:22,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=286368.0, ans=0.125 2024-10-08 13:04:54,640 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.354e+02 4.207e+02 4.766e+02 5.806e+02 7.947e+02, threshold=9.533e+02, percent-clipped=0.0 2024-10-08 13:05:24,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=286848.0, ans=0.0 2024-10-08 13:05:30,125 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=286848.0, ans=0.125 2024-10-08 13:05:44,136 INFO [train.py:1136] (0/2) Epoch 30, batch 250, loss[loss=0.1882, simple_loss=0.2897, pruned_loss=0.04335, over 85996.00 frames. ], tot_loss[loss=0.1826, simple_loss=0.2821, pruned_loss=0.04155, over 12247692.48 frames. ], batch size: 721, lr: 8.52e-03, grad_scale: 16.0 2024-10-08 13:06:08,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=287088.0, ans=0.125 2024-10-08 13:07:15,269 INFO [train.py:1136] (0/2) Epoch 30, batch 300, loss[loss=0.1771, simple_loss=0.2833, pruned_loss=0.03548, over 87333.00 frames. ], tot_loss[loss=0.1819, simple_loss=0.2814, pruned_loss=0.04123, over 13359596.31 frames. ], batch size: 490, lr: 8.51e-03, grad_scale: 16.0 2024-10-08 13:07:34,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=287568.0, ans=0.0 2024-10-08 13:07:59,530 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-10-08 13:08:01,607 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.597e+02 4.158e+02 4.503e+02 5.147e+02 1.097e+03, threshold=9.006e+02, percent-clipped=1.0 2024-10-08 13:08:22,388 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287928.0, ans=0.1 2024-10-08 13:08:25,426 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-24000.pt 2024-10-08 13:08:31,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=287928.0, ans=0.125 2024-10-08 13:08:53,587 INFO [train.py:1136] (0/2) Epoch 30, batch 350, loss[loss=0.1773, simple_loss=0.267, pruned_loss=0.04384, over 85726.00 frames. ], tot_loss[loss=0.1817, simple_loss=0.281, pruned_loss=0.04119, over 14200491.43 frames. ], batch size: 180, lr: 8.50e-03, grad_scale: 16.0 2024-10-08 13:09:32,047 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=288408.0, ans=0.125 2024-10-08 13:10:07,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.10 vs. limit=10.0 2024-10-08 13:10:30,524 INFO [train.py:1136] (0/2) Epoch 30, batch 400, loss[loss=0.1906, simple_loss=0.2935, pruned_loss=0.0439, over 84570.00 frames. ], tot_loss[loss=0.1829, simple_loss=0.282, pruned_loss=0.04187, over 14800745.85 frames. ], batch size: 957, lr: 8.50e-03, grad_scale: 32.0 2024-10-08 13:10:41,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=288768.0, ans=0.2 2024-10-08 13:11:12,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=289008.0, ans=0.125 2024-10-08 13:11:16,712 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.501e+02 4.115e+02 4.480e+02 5.065e+02 7.668e+02, threshold=8.961e+02, percent-clipped=0.0 2024-10-08 13:11:26,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=289008.0, ans=0.125 2024-10-08 13:12:02,087 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 13:12:05,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289368.0, ans=0.1 2024-10-08 13:12:06,729 INFO [train.py:1136] (0/2) Epoch 30, batch 450, loss[loss=0.1652, simple_loss=0.271, pruned_loss=0.02974, over 87083.00 frames. ], tot_loss[loss=0.1827, simple_loss=0.2818, pruned_loss=0.04183, over 15316174.01 frames. ], batch size: 517, lr: 8.49e-03, grad_scale: 32.0 2024-10-08 13:12:10,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=289368.0, ans=0.125 2024-10-08 13:12:14,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=289368.0, ans=0.2 2024-10-08 13:12:36,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=289488.0, ans=0.125 2024-10-08 13:13:00,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=289608.0, ans=0.125 2024-10-08 13:13:02,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2024-10-08 13:13:19,082 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-10-08 13:13:31,297 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289848.0, ans=0.1 2024-10-08 13:13:42,673 INFO [train.py:1136] (0/2) Epoch 30, batch 500, loss[loss=0.1782, simple_loss=0.2722, pruned_loss=0.04212, over 87280.00 frames. ], tot_loss[loss=0.183, simple_loss=0.2821, pruned_loss=0.04194, over 15682046.35 frames. ], batch size: 280, lr: 8.48e-03, grad_scale: 16.0 2024-10-08 13:13:51,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=289968.0, ans=0.0 2024-10-08 13:14:07,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=290088.0, ans=0.125 2024-10-08 13:14:09,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=290088.0, ans=0.125 2024-10-08 13:14:15,128 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 13:14:30,396 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.534e+02 4.229e+02 4.725e+02 5.442e+02 8.478e+02, threshold=9.450e+02, percent-clipped=0.0 2024-10-08 13:14:59,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=15.0 2024-10-08 13:15:07,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=290448.0, ans=0.025 2024-10-08 13:15:07,796 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-10-08 13:15:13,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=290448.0, ans=0.125 2024-10-08 13:15:18,806 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 13:15:19,913 INFO [train.py:1136] (0/2) Epoch 30, batch 550, loss[loss=0.1705, simple_loss=0.2755, pruned_loss=0.03277, over 87364.00 frames. ], tot_loss[loss=0.1825, simple_loss=0.2817, pruned_loss=0.04168, over 16001208.23 frames. ], batch size: 415, lr: 8.47e-03, grad_scale: 16.0 2024-10-08 13:16:06,240 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 13:16:06,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=290808.0, ans=0.0 2024-10-08 13:16:25,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=290928.0, ans=0.125 2024-10-08 13:16:26,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290928.0, ans=0.1 2024-10-08 13:16:36,657 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 13:16:43,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=291048.0, ans=0.125 2024-10-08 13:16:57,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=291168.0, ans=0.125 2024-10-08 13:16:58,269 INFO [train.py:1136] (0/2) Epoch 30, batch 600, loss[loss=0.1736, simple_loss=0.2716, pruned_loss=0.03786, over 87102.00 frames. ], tot_loss[loss=0.183, simple_loss=0.2825, pruned_loss=0.04178, over 16228367.32 frames. ], batch size: 264, lr: 8.46e-03, grad_scale: 16.0 2024-10-08 13:17:37,716 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=291408.0, ans=0.025 2024-10-08 13:17:46,625 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.541e+02 4.547e+02 5.227e+02 5.883e+02 9.177e+02, threshold=1.045e+03, percent-clipped=0.0 2024-10-08 13:18:34,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=291768.0, ans=0.125 2024-10-08 13:18:35,993 INFO [train.py:1136] (0/2) Epoch 30, batch 650, loss[loss=0.1954, simple_loss=0.2988, pruned_loss=0.04602, over 83479.00 frames. ], tot_loss[loss=0.1832, simple_loss=0.2826, pruned_loss=0.0419, over 16355135.54 frames. ], batch size: 1077, lr: 8.46e-03, grad_scale: 16.0 2024-10-08 13:18:44,328 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=291768.0, ans=0.0 2024-10-08 13:19:13,170 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=292008.0, ans=0.5 2024-10-08 13:19:19,524 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 13:19:34,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2024-10-08 13:19:37,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=292128.0, ans=0.125 2024-10-08 13:19:59,553 INFO [train.py:1136] (0/2) Epoch 30, batch 700, loss[loss=0.1995, simple_loss=0.302, pruned_loss=0.04854, over 81840.00 frames. ], tot_loss[loss=0.1827, simple_loss=0.2821, pruned_loss=0.04164, over 16509772.36 frames. ], batch size: 1245, lr: 8.45e-03, grad_scale: 16.0 2024-10-08 13:20:04,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2024-10-08 13:20:06,072 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=292368.0, ans=0.0 2024-10-08 13:20:13,953 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=292488.0, ans=0.035 2024-10-08 13:20:32,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=292608.0, ans=0.2 2024-10-08 13:20:38,676 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.543e+02 4.266e+02 4.697e+02 5.333e+02 9.907e+02, threshold=9.393e+02, percent-clipped=0.0 2024-10-08 13:20:42,613 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-10-08 13:20:46,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=292728.0, ans=0.2 2024-10-08 13:20:53,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=292728.0, ans=0.025 2024-10-08 13:21:05,948 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=292848.0, ans=0.125 2024-10-08 13:21:10,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=292848.0, ans=0.0 2024-10-08 13:21:16,942 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=292848.0, ans=0.125 2024-10-08 13:21:21,406 INFO [train.py:1136] (0/2) Epoch 30, batch 750, loss[loss=0.1762, simple_loss=0.2754, pruned_loss=0.03852, over 87226.00 frames. ], tot_loss[loss=0.1827, simple_loss=0.2821, pruned_loss=0.04163, over 16589030.20 frames. ], batch size: 313, lr: 8.44e-03, grad_scale: 8.0 2024-10-08 13:21:39,318 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-10-08 13:22:27,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=293448.0, ans=0.125 2024-10-08 13:22:44,389 INFO [train.py:1136] (0/2) Epoch 30, batch 800, loss[loss=0.1737, simple_loss=0.2795, pruned_loss=0.03395, over 87447.00 frames. ], tot_loss[loss=0.1823, simple_loss=0.2819, pruned_loss=0.04131, over 16735559.86 frames. ], batch size: 464, lr: 8.43e-03, grad_scale: 16.0 2024-10-08 13:22:59,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=293688.0, ans=0.0 2024-10-08 13:23:10,003 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-30.pt 2024-10-08 13:24:09,127 INFO [train.py:1136] (0/2) Epoch 31, batch 0, loss[loss=0.1793, simple_loss=0.2797, pruned_loss=0.03943, over 87433.00 frames. ], tot_loss[loss=0.1793, simple_loss=0.2797, pruned_loss=0.03943, over 87433.00 frames. ], batch size: 393, lr: 8.29e-03, grad_scale: 32.0 2024-10-08 13:24:09,128 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 13:24:20,854 INFO [train.py:1168] (0/2) Epoch 31, validation: loss=0.1666, simple_loss=0.2772, pruned_loss=0.02793, over 1382211.00 frames. 2024-10-08 13:24:20,854 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 13:24:39,793 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.669e+02 4.257e+02 4.749e+02 5.521e+02 1.005e+03, threshold=9.498e+02, percent-clipped=2.0 2024-10-08 13:25:24,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=294120.0, ans=15.0 2024-10-08 13:25:27,318 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=294120.0, ans=0.125 2024-10-08 13:25:32,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=294120.0, ans=0.125 2024-10-08 13:25:47,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=294240.0, ans=0.125 2024-10-08 13:25:58,213 INFO [train.py:1136] (0/2) Epoch 31, batch 50, loss[loss=0.202, simple_loss=0.3038, pruned_loss=0.0501, over 82059.00 frames. ], tot_loss[loss=0.1838, simple_loss=0.2829, pruned_loss=0.04229, over 3827143.86 frames. ], batch size: 1245, lr: 8.28e-03, grad_scale: 16.0 2024-10-08 13:26:48,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=294600.0, ans=0.1 2024-10-08 13:27:16,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=294840.0, ans=0.125 2024-10-08 13:27:17,440 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.56 vs. limit=12.0 2024-10-08 13:27:30,531 INFO [train.py:1136] (0/2) Epoch 31, batch 100, loss[loss=0.1709, simple_loss=0.2764, pruned_loss=0.03273, over 87434.00 frames. ], tot_loss[loss=0.1801, simple_loss=0.2801, pruned_loss=0.04004, over 6819704.93 frames. ], batch size: 490, lr: 8.28e-03, grad_scale: 16.0 2024-10-08 13:27:48,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-10-08 13:27:51,168 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.496e+02 4.213e+02 4.765e+02 5.310e+02 7.209e+02, threshold=9.529e+02, percent-clipped=0.0 2024-10-08 13:27:51,808 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=295080.0, ans=0.125 2024-10-08 13:27:54,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=295080.0, ans=0.125 2024-10-08 13:28:01,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=295080.0, ans=22.5 2024-10-08 13:29:03,008 INFO [train.py:1136] (0/2) Epoch 31, batch 150, loss[loss=0.1922, simple_loss=0.2937, pruned_loss=0.04538, over 85867.00 frames. ], tot_loss[loss=0.1812, simple_loss=0.2806, pruned_loss=0.04088, over 9104703.31 frames. ], batch size: 721, lr: 8.27e-03, grad_scale: 16.0 2024-10-08 13:30:39,524 INFO [train.py:1136] (0/2) Epoch 31, batch 200, loss[loss=0.1686, simple_loss=0.2734, pruned_loss=0.03193, over 87392.00 frames. ], tot_loss[loss=0.1815, simple_loss=0.281, pruned_loss=0.04099, over 10882685.72 frames. ], batch size: 464, lr: 8.26e-03, grad_scale: 16.0 2024-10-08 13:30:43,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=296160.0, ans=0.0 2024-10-08 13:31:00,566 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.625e+02 4.151e+02 4.637e+02 5.363e+02 7.371e+02, threshold=9.273e+02, percent-clipped=0.0 2024-10-08 13:31:15,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=296280.0, ans=0.07 2024-10-08 13:31:27,884 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-10-08 13:32:08,146 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=296640.0, ans=0.125 2024-10-08 13:32:14,447 INFO [train.py:1136] (0/2) Epoch 31, batch 250, loss[loss=0.1719, simple_loss=0.2775, pruned_loss=0.03314, over 87213.00 frames. ], tot_loss[loss=0.1814, simple_loss=0.2811, pruned_loss=0.0408, over 12269004.55 frames. ], batch size: 517, lr: 8.25e-03, grad_scale: 16.0 2024-10-08 13:32:17,556 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=8.0 2024-10-08 13:32:19,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=296760.0, ans=0.09899494936611666 2024-10-08 13:32:19,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=296760.0, ans=0.125 2024-10-08 13:33:24,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=297120.0, ans=0.125 2024-10-08 13:33:27,688 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=297240.0, ans=0.125 2024-10-08 13:33:44,550 INFO [train.py:1136] (0/2) Epoch 31, batch 300, loss[loss=0.1709, simple_loss=0.2761, pruned_loss=0.03288, over 87429.00 frames. ], tot_loss[loss=0.1806, simple_loss=0.2804, pruned_loss=0.04043, over 13374039.62 frames. ], batch size: 439, lr: 8.24e-03, grad_scale: 16.0 2024-10-08 13:34:03,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=297360.0, ans=0.125 2024-10-08 13:34:08,200 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.477e+02 3.959e+02 4.364e+02 4.936e+02 7.497e+02, threshold=8.727e+02, percent-clipped=0.0 2024-10-08 13:34:22,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=297600.0, ans=0.025 2024-10-08 13:34:37,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=297600.0, ans=0.125 2024-10-08 13:35:02,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=297840.0, ans=0.0 2024-10-08 13:35:12,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=297840.0, ans=0.05 2024-10-08 13:35:12,577 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=297840.0, ans=0.025 2024-10-08 13:35:18,998 INFO [train.py:1136] (0/2) Epoch 31, batch 350, loss[loss=0.1933, simple_loss=0.296, pruned_loss=0.04529, over 84490.00 frames. ], tot_loss[loss=0.1804, simple_loss=0.2803, pruned_loss=0.04026, over 14219095.96 frames. ], batch size: 957, lr: 8.24e-03, grad_scale: 16.0 2024-10-08 13:36:04,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=298200.0, ans=0.0 2024-10-08 13:36:06,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=298200.0, ans=0.125 2024-10-08 13:36:16,541 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=298320.0, ans=0.125 2024-10-08 13:36:42,386 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.76 vs. limit=15.0 2024-10-08 13:36:53,425 INFO [train.py:1136] (0/2) Epoch 31, batch 400, loss[loss=0.1754, simple_loss=0.2747, pruned_loss=0.03804, over 87069.00 frames. ], tot_loss[loss=0.1799, simple_loss=0.2798, pruned_loss=0.03998, over 14880507.95 frames. ], batch size: 330, lr: 8.23e-03, grad_scale: 32.0 2024-10-08 13:37:12,808 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.457e+02 3.904e+02 4.308e+02 4.673e+02 1.604e+03, threshold=8.615e+02, percent-clipped=1.0 2024-10-08 13:37:23,673 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=298680.0, ans=0.125 2024-10-08 13:37:36,088 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=298800.0, ans=0.125 2024-10-08 13:37:48,532 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298920.0, ans=0.1 2024-10-08 13:37:53,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=298920.0, ans=0.125 2024-10-08 13:38:15,981 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=299040.0, ans=0.0 2024-10-08 13:38:27,587 INFO [train.py:1136] (0/2) Epoch 31, batch 450, loss[loss=0.1873, simple_loss=0.286, pruned_loss=0.04432, over 86972.00 frames. ], tot_loss[loss=0.1804, simple_loss=0.2799, pruned_loss=0.04048, over 15342748.16 frames. ], batch size: 548, lr: 8.22e-03, grad_scale: 16.0 2024-10-08 13:38:47,721 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2024-10-08 13:38:49,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=299280.0, ans=0.05 2024-10-08 13:39:25,708 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=299520.0, ans=0.025 2024-10-08 13:39:43,889 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=299640.0, ans=0.125 2024-10-08 13:39:56,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2024-10-08 13:40:00,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=299640.0, ans=0.2 2024-10-08 13:40:03,079 INFO [train.py:1136] (0/2) Epoch 31, batch 500, loss[loss=0.1895, simple_loss=0.292, pruned_loss=0.04349, over 85850.00 frames. ], tot_loss[loss=0.18, simple_loss=0.2794, pruned_loss=0.04029, over 15762233.70 frames. ], batch size: 721, lr: 8.21e-03, grad_scale: 16.0 2024-10-08 13:40:16,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=299760.0, ans=0.0 2024-10-08 13:40:23,328 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.541e+02 4.168e+02 4.791e+02 5.737e+02 1.235e+03, threshold=9.582e+02, percent-clipped=1.0 2024-10-08 13:40:51,190 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=300000.0, ans=0.2 2024-10-08 13:40:52,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300000.0, ans=0.1 2024-10-08 13:40:54,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=22.5 2024-10-08 13:40:58,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=300120.0, ans=0.125 2024-10-08 13:41:06,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=300120.0, ans=0.125 2024-10-08 13:41:25,459 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=300240.0, ans=0.125 2024-10-08 13:41:35,421 INFO [train.py:1136] (0/2) Epoch 31, batch 550, loss[loss=0.1678, simple_loss=0.2708, pruned_loss=0.03243, over 87291.00 frames. ], tot_loss[loss=0.1806, simple_loss=0.28, pruned_loss=0.04056, over 16054462.26 frames. ], batch size: 415, lr: 8.21e-03, grad_scale: 8.0 2024-10-08 13:41:53,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=300480.0, ans=0.125 2024-10-08 13:42:08,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=300480.0, ans=0.0 2024-10-08 13:42:39,634 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2024-10-08 13:42:42,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=300720.0, ans=0.125 2024-10-08 13:42:44,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=300720.0, ans=0.025 2024-10-08 13:42:55,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=300840.0, ans=0.125 2024-10-08 13:43:07,654 INFO [train.py:1136] (0/2) Epoch 31, batch 600, loss[loss=0.1731, simple_loss=0.2637, pruned_loss=0.04126, over 85836.00 frames. ], tot_loss[loss=0.1796, simple_loss=0.279, pruned_loss=0.04004, over 16319259.11 frames. ], batch size: 180, lr: 8.20e-03, grad_scale: 8.0 2024-10-08 13:43:34,890 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.438e+02 3.977e+02 4.372e+02 4.978e+02 7.184e+02, threshold=8.744e+02, percent-clipped=0.0 2024-10-08 13:43:38,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=301080.0, ans=0.125 2024-10-08 13:43:40,581 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=301080.0, ans=0.125 2024-10-08 13:44:02,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-10-08 13:44:43,734 INFO [train.py:1136] (0/2) Epoch 31, batch 650, loss[loss=0.2062, simple_loss=0.2998, pruned_loss=0.05628, over 69920.00 frames. ], tot_loss[loss=0.1805, simple_loss=0.2797, pruned_loss=0.04065, over 16410466.50 frames. ], batch size: 1960, lr: 8.19e-03, grad_scale: 8.0 2024-10-08 13:44:55,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=301560.0, ans=0.125 2024-10-08 13:45:46,910 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=15.0 2024-10-08 13:45:51,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=301920.0, ans=0.2 2024-10-08 13:45:51,119 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=301920.0, ans=0.125 2024-10-08 13:46:08,586 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2024-10-08 13:46:14,055 INFO [train.py:1136] (0/2) Epoch 31, batch 700, loss[loss=0.223, simple_loss=0.3169, pruned_loss=0.0645, over 78725.00 frames. ], tot_loss[loss=0.1806, simple_loss=0.28, pruned_loss=0.04059, over 16572657.13 frames. ], batch size: 1493, lr: 8.18e-03, grad_scale: 8.0 2024-10-08 13:46:21,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-10-08 13:46:22,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=302160.0, ans=0.125 2024-10-08 13:46:27,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=302160.0, ans=0.2 2024-10-08 13:46:34,718 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.421e+02 4.142e+02 4.736e+02 5.628e+02 7.960e+02, threshold=9.471e+02, percent-clipped=0.0 2024-10-08 13:46:38,195 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=302280.0, ans=0.2 2024-10-08 13:46:46,831 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302400.0, ans=0.1 2024-10-08 13:46:58,287 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2024-10-08 13:47:15,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=302520.0, ans=0.0 2024-10-08 13:47:18,561 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=302520.0, ans=0.125 2024-10-08 13:47:20,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=302640.0, ans=0.0 2024-10-08 13:47:36,954 INFO [train.py:1136] (0/2) Epoch 31, batch 750, loss[loss=0.1692, simple_loss=0.2748, pruned_loss=0.03184, over 87217.00 frames. ], tot_loss[loss=0.1806, simple_loss=0.28, pruned_loss=0.04056, over 16683927.05 frames. ], batch size: 439, lr: 8.18e-03, grad_scale: 8.0 2024-10-08 13:47:56,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302880.0, ans=0.1 2024-10-08 13:48:05,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=302880.0, ans=0.2 2024-10-08 13:48:08,108 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.28 vs. limit=22.5 2024-10-08 13:48:12,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=303000.0, ans=0.07 2024-10-08 13:48:54,711 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=12.0 2024-10-08 13:49:01,710 INFO [train.py:1136] (0/2) Epoch 31, batch 800, loss[loss=0.1921, simple_loss=0.2944, pruned_loss=0.04491, over 81922.00 frames. ], tot_loss[loss=0.1819, simple_loss=0.2814, pruned_loss=0.0412, over 16697084.71 frames. ], batch size: 1245, lr: 8.17e-03, grad_scale: 16.0 2024-10-08 13:49:09,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=303360.0, ans=0.125 2024-10-08 13:49:23,214 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.608e+02 4.230e+02 4.726e+02 5.494e+02 8.060e+02, threshold=9.451e+02, percent-clipped=0.0 2024-10-08 13:49:27,552 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-31.pt 2024-10-08 13:50:26,880 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=303552.0, ans=0.125 2024-10-08 13:50:28,263 INFO [train.py:1136] (0/2) Epoch 32, batch 0, loss[loss=0.2046, simple_loss=0.296, pruned_loss=0.05657, over 69719.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.296, pruned_loss=0.05657, over 69719.00 frames. ], batch size: 1960, lr: 8.04e-03, grad_scale: 32.0 2024-10-08 13:50:28,264 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 13:50:39,335 INFO [train.py:1168] (0/2) Epoch 32, validation: loss=0.1671, simple_loss=0.2783, pruned_loss=0.02802, over 1382211.00 frames. 2024-10-08 13:50:39,336 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 13:50:41,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=303552.0, ans=0.125 2024-10-08 13:51:25,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=303792.0, ans=0.125 2024-10-08 13:51:26,979 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=303792.0, ans=0.2 2024-10-08 13:51:28,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=303792.0, ans=0.125 2024-10-08 13:51:30,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=303792.0, ans=0.125 2024-10-08 13:51:33,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=303912.0, ans=0.2 2024-10-08 13:51:42,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2024-10-08 13:52:04,132 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=304032.0, ans=0.125 2024-10-08 13:52:12,114 INFO [train.py:1136] (0/2) Epoch 32, batch 50, loss[loss=0.1866, simple_loss=0.2896, pruned_loss=0.0418, over 83374.00 frames. ], tot_loss[loss=0.1801, simple_loss=0.2804, pruned_loss=0.03985, over 3831405.12 frames. ], batch size: 1077, lr: 8.03e-03, grad_scale: 32.0 2024-10-08 13:53:03,941 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 13:53:05,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=304392.0, ans=0.125 2024-10-08 13:53:39,862 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2024-10-08 13:53:42,268 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.463e+02 3.911e+02 4.244e+02 4.884e+02 7.242e+02, threshold=8.487e+02, percent-clipped=0.0 2024-10-08 13:53:47,348 INFO [train.py:1136] (0/2) Epoch 32, batch 100, loss[loss=0.172, simple_loss=0.2638, pruned_loss=0.04008, over 86427.00 frames. ], tot_loss[loss=0.1791, simple_loss=0.2789, pruned_loss=0.03963, over 6816983.98 frames. ], batch size: 213, lr: 8.02e-03, grad_scale: 32.0 2024-10-08 13:54:21,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=304872.0, ans=0.0 2024-10-08 13:54:45,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305112.0, ans=0.1 2024-10-08 13:54:55,704 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=305112.0, ans=0.125 2024-10-08 13:55:19,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=305352.0, ans=0.0 2024-10-08 13:55:19,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=305352.0, ans=0.125 2024-10-08 13:55:20,391 INFO [train.py:1136] (0/2) Epoch 32, batch 150, loss[loss=0.21, simple_loss=0.3038, pruned_loss=0.0581, over 69570.00 frames. ], tot_loss[loss=0.1793, simple_loss=0.2792, pruned_loss=0.0397, over 9104789.93 frames. ], batch size: 1960, lr: 8.01e-03, grad_scale: 32.0 2024-10-08 13:55:47,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=305472.0, ans=0.125 2024-10-08 13:56:02,143 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2024-10-08 13:56:11,871 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=305592.0, ans=0.0 2024-10-08 13:56:41,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=305832.0, ans=0.0 2024-10-08 13:56:43,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=305832.0, ans=0.0 2024-10-08 13:56:47,893 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.467e+02 4.102e+02 4.619e+02 5.433e+02 7.461e+02, threshold=9.238e+02, percent-clipped=0.0 2024-10-08 13:56:55,327 INFO [train.py:1136] (0/2) Epoch 32, batch 200, loss[loss=0.1809, simple_loss=0.2835, pruned_loss=0.0392, over 86402.00 frames. ], tot_loss[loss=0.1792, simple_loss=0.2788, pruned_loss=0.03974, over 10902212.41 frames. ], batch size: 668, lr: 8.01e-03, grad_scale: 32.0 2024-10-08 13:57:25,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=306072.0, ans=0.07 2024-10-08 13:57:27,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=306072.0, ans=0.0 2024-10-08 13:57:58,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=306312.0, ans=0.1 2024-10-08 13:57:59,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=306312.0, ans=0.025 2024-10-08 13:58:04,156 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=306312.0, ans=0.125 2024-10-08 13:58:25,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=306432.0, ans=0.0 2024-10-08 13:58:28,911 INFO [train.py:1136] (0/2) Epoch 32, batch 250, loss[loss=0.1678, simple_loss=0.2691, pruned_loss=0.03324, over 87047.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.2787, pruned_loss=0.03956, over 12295723.86 frames. ], batch size: 350, lr: 8.00e-03, grad_scale: 16.0 2024-10-08 13:59:11,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=306792.0, ans=0.0 2024-10-08 13:59:26,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=306912.0, ans=10.0 2024-10-08 13:59:47,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=307032.0, ans=0.125 2024-10-08 13:59:48,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=307032.0, ans=0.0 2024-10-08 13:59:55,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=307032.0, ans=0.125 2024-10-08 13:59:58,658 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.468e+02 4.134e+02 4.586e+02 5.271e+02 1.124e+03, threshold=9.171e+02, percent-clipped=1.0 2024-10-08 14:00:02,072 INFO [train.py:1136] (0/2) Epoch 32, batch 300, loss[loss=0.1672, simple_loss=0.2601, pruned_loss=0.03718, over 86369.00 frames. ], tot_loss[loss=0.1788, simple_loss=0.2786, pruned_loss=0.03951, over 13380154.82 frames. ], batch size: 197, lr: 7.99e-03, grad_scale: 16.0 2024-10-08 14:01:01,048 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=307512.0, ans=0.1 2024-10-08 14:01:16,963 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=307632.0, ans=0.035 2024-10-08 14:01:37,690 INFO [train.py:1136] (0/2) Epoch 32, batch 350, loss[loss=0.1802, simple_loss=0.2768, pruned_loss=0.0418, over 86986.00 frames. ], tot_loss[loss=0.1785, simple_loss=0.2781, pruned_loss=0.03946, over 14222193.00 frames. ], batch size: 330, lr: 7.99e-03, grad_scale: 16.0 2024-10-08 14:02:54,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=308232.0, ans=0.0 2024-10-08 14:03:07,758 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.566e+02 4.075e+02 4.759e+02 5.114e+02 1.003e+03, threshold=9.519e+02, percent-clipped=1.0 2024-10-08 14:03:11,101 INFO [train.py:1136] (0/2) Epoch 32, batch 400, loss[loss=0.1807, simple_loss=0.2759, pruned_loss=0.04277, over 87300.00 frames. ], tot_loss[loss=0.1782, simple_loss=0.2777, pruned_loss=0.03935, over 14869158.96 frames. ], batch size: 313, lr: 7.98e-03, grad_scale: 32.0 2024-10-08 14:03:14,735 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=308352.0, ans=0.125 2024-10-08 14:03:25,660 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=15.0 2024-10-08 14:03:59,615 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=308592.0, ans=0.125 2024-10-08 14:04:08,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=308712.0, ans=0.0 2024-10-08 14:04:20,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.92 vs. limit=22.5 2024-10-08 14:04:39,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=308832.0, ans=0.0 2024-10-08 14:04:42,123 INFO [train.py:1136] (0/2) Epoch 32, batch 450, loss[loss=0.1739, simple_loss=0.2687, pruned_loss=0.03953, over 87295.00 frames. ], tot_loss[loss=0.1788, simple_loss=0.2788, pruned_loss=0.03943, over 15394062.52 frames. ], batch size: 280, lr: 7.97e-03, grad_scale: 16.0 2024-10-08 14:05:38,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=309312.0, ans=0.09899494936611666 2024-10-08 14:06:04,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=309432.0, ans=0.035 2024-10-08 14:06:04,412 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=309432.0, ans=0.0 2024-10-08 14:06:16,237 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.665e+02 4.060e+02 4.444e+02 4.960e+02 7.123e+02, threshold=8.888e+02, percent-clipped=0.0 2024-10-08 14:06:18,173 INFO [train.py:1136] (0/2) Epoch 32, batch 500, loss[loss=0.2008, simple_loss=0.3033, pruned_loss=0.04916, over 81959.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.2789, pruned_loss=0.03949, over 15788983.09 frames. ], batch size: 1245, lr: 7.96e-03, grad_scale: 16.0 2024-10-08 14:06:18,699 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=309552.0, ans=0.1 2024-10-08 14:06:33,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=309552.0, ans=0.95 2024-10-08 14:07:06,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309792.0, ans=0.1 2024-10-08 14:07:48,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310032.0, ans=0.1 2024-10-08 14:07:54,623 INFO [train.py:1136] (0/2) Epoch 32, batch 550, loss[loss=0.215, simple_loss=0.3108, pruned_loss=0.05956, over 78635.00 frames. ], tot_loss[loss=0.1792, simple_loss=0.2791, pruned_loss=0.03967, over 16080342.54 frames. ], batch size: 1493, lr: 7.96e-03, grad_scale: 16.0 2024-10-08 14:08:05,288 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=310152.0, ans=0.0 2024-10-08 14:08:34,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=310392.0, ans=0.125 2024-10-08 14:08:35,220 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2024-10-08 14:09:00,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=310512.0, ans=0.125 2024-10-08 14:09:06,411 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=310512.0, ans=0.125 2024-10-08 14:09:17,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=310632.0, ans=0.2 2024-10-08 14:09:20,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=310632.0, ans=0.1 2024-10-08 14:09:28,280 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.712e+02 4.052e+02 4.634e+02 5.513e+02 1.450e+03, threshold=9.268e+02, percent-clipped=1.0 2024-10-08 14:09:30,081 INFO [train.py:1136] (0/2) Epoch 32, batch 600, loss[loss=0.1652, simple_loss=0.2601, pruned_loss=0.03513, over 86277.00 frames. ], tot_loss[loss=0.1802, simple_loss=0.2801, pruned_loss=0.04012, over 16296053.06 frames. ], batch size: 197, lr: 7.95e-03, grad_scale: 16.0 2024-10-08 14:10:16,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=310992.0, ans=0.2 2024-10-08 14:10:26,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=311112.0, ans=0.0 2024-10-08 14:10:44,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=311232.0, ans=0.125 2024-10-08 14:11:05,711 INFO [train.py:1136] (0/2) Epoch 32, batch 650, loss[loss=0.1679, simple_loss=0.2721, pruned_loss=0.03182, over 87423.00 frames. ], tot_loss[loss=0.1799, simple_loss=0.2799, pruned_loss=0.03999, over 16479009.14 frames. ], batch size: 439, lr: 7.94e-03, grad_scale: 16.0 2024-10-08 14:11:18,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.52 vs. limit=10.0 2024-10-08 14:11:28,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=311472.0, ans=0.125 2024-10-08 14:11:51,443 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=311592.0, ans=0.1 2024-10-08 14:11:59,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=311592.0, ans=0.125 2024-10-08 14:12:30,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=12.0 2024-10-08 14:12:32,606 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.413e+02 3.998e+02 4.518e+02 5.081e+02 7.371e+02, threshold=9.036e+02, percent-clipped=0.0 2024-10-08 14:12:34,245 INFO [train.py:1136] (0/2) Epoch 32, batch 700, loss[loss=0.1661, simple_loss=0.2647, pruned_loss=0.03374, over 87262.00 frames. ], tot_loss[loss=0.1805, simple_loss=0.2805, pruned_loss=0.04026, over 16609265.51 frames. ], batch size: 264, lr: 7.94e-03, grad_scale: 16.0 2024-10-08 14:12:36,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=311952.0, ans=0.125 2024-10-08 14:12:52,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=312072.0, ans=0.125 2024-10-08 14:12:55,470 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=312072.0, ans=0.2 2024-10-08 14:13:13,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=312192.0, ans=0.2 2024-10-08 14:13:29,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=312312.0, ans=0.125 2024-10-08 14:13:31,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=312312.0, ans=0.2 2024-10-08 14:13:34,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=312312.0, ans=0.125 2024-10-08 14:13:57,495 INFO [train.py:1136] (0/2) Epoch 32, batch 750, loss[loss=0.1888, simple_loss=0.2908, pruned_loss=0.04337, over 85390.00 frames. ], tot_loss[loss=0.1808, simple_loss=0.2809, pruned_loss=0.04034, over 16711775.10 frames. ], batch size: 787, lr: 7.93e-03, grad_scale: 16.0 2024-10-08 14:14:14,293 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2024-10-08 14:14:27,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=312672.0, ans=0.0 2024-10-08 14:14:29,223 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312792.0, ans=0.1 2024-10-08 14:14:38,603 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=312792.0, ans=0.125 2024-10-08 14:14:51,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=312912.0, ans=0.0 2024-10-08 14:15:15,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=313032.0, ans=0.2 2024-10-08 14:15:18,280 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.425e+02 4.220e+02 4.841e+02 5.429e+02 7.357e+02, threshold=9.682e+02, percent-clipped=0.0 2024-10-08 14:15:19,886 INFO [train.py:1136] (0/2) Epoch 32, batch 800, loss[loss=0.179, simple_loss=0.2844, pruned_loss=0.03675, over 87316.00 frames. ], tot_loss[loss=0.1812, simple_loss=0.2815, pruned_loss=0.04039, over 16762408.90 frames. ], batch size: 393, lr: 7.92e-03, grad_scale: 32.0 2024-10-08 14:15:47,398 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-32.pt 2024-10-08 14:16:41,381 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313344.0, ans=0.1 2024-10-08 14:16:42,705 INFO [train.py:1136] (0/2) Epoch 33, batch 0, loss[loss=0.1994, simple_loss=0.2996, pruned_loss=0.04966, over 81826.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2996, pruned_loss=0.04966, over 81826.00 frames. ], batch size: 1245, lr: 7.80e-03, grad_scale: 16.0 2024-10-08 14:16:42,706 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 14:16:53,624 INFO [train.py:1168] (0/2) Epoch 33, validation: loss=0.1685, simple_loss=0.2795, pruned_loss=0.02875, over 1382211.00 frames. 2024-10-08 14:16:53,624 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 14:17:00,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=313344.0, ans=0.125 2024-10-08 14:17:22,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=313464.0, ans=0.2 2024-10-08 14:17:25,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=313464.0, ans=0.125 2024-10-08 14:17:33,758 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.80 vs. limit=15.0 2024-10-08 14:17:48,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=313704.0, ans=0.2 2024-10-08 14:18:07,102 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=313824.0, ans=0.0 2024-10-08 14:18:08,599 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=313824.0, ans=0.0 2024-10-08 14:18:23,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=313824.0, ans=0.2 2024-10-08 14:18:26,463 INFO [train.py:1136] (0/2) Epoch 33, batch 50, loss[loss=0.1796, simple_loss=0.2643, pruned_loss=0.04749, over 86130.00 frames. ], tot_loss[loss=0.1798, simple_loss=0.2798, pruned_loss=0.03989, over 3866134.79 frames. ], batch size: 180, lr: 7.79e-03, grad_scale: 16.0 2024-10-08 14:18:33,787 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=313944.0, ans=0.2 2024-10-08 14:18:34,527 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-10-08 14:18:38,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=313944.0, ans=0.1 2024-10-08 14:18:59,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=314064.0, ans=0.5 2024-10-08 14:19:11,645 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=314184.0, ans=0.125 2024-10-08 14:19:30,990 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.503e+02 3.987e+02 4.316e+02 4.753e+02 1.523e+03, threshold=8.631e+02, percent-clipped=1.0 2024-10-08 14:19:59,566 INFO [train.py:1136] (0/2) Epoch 33, batch 100, loss[loss=0.1639, simple_loss=0.271, pruned_loss=0.0284, over 87350.00 frames. ], tot_loss[loss=0.1783, simple_loss=0.2787, pruned_loss=0.03898, over 6819795.31 frames. ], batch size: 490, lr: 7.78e-03, grad_scale: 16.0 2024-10-08 14:20:11,833 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=314544.0, ans=0.025 2024-10-08 14:20:21,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=314664.0, ans=0.025 2024-10-08 14:20:36,487 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=314784.0, ans=0.125 2024-10-08 14:20:54,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.87 vs. limit=6.0 2024-10-08 14:21:35,448 INFO [train.py:1136] (0/2) Epoch 33, batch 150, loss[loss=0.1699, simple_loss=0.2683, pruned_loss=0.03571, over 87057.00 frames. ], tot_loss[loss=0.1785, simple_loss=0.2787, pruned_loss=0.03915, over 9125700.83 frames. ], batch size: 350, lr: 7.78e-03, grad_scale: 8.0 2024-10-08 14:21:43,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=315144.0, ans=0.0 2024-10-08 14:21:56,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=315264.0, ans=0.125 2024-10-08 14:22:22,281 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2024-10-08 14:22:39,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=315504.0, ans=0.125 2024-10-08 14:22:41,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=315504.0, ans=0.125 2024-10-08 14:22:42,455 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.342e+02 4.107e+02 4.439e+02 4.948e+02 6.805e+02, threshold=8.879e+02, percent-clipped=0.0 2024-10-08 14:22:44,767 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=315504.0, ans=0.1 2024-10-08 14:22:46,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=315504.0, ans=0.0 2024-10-08 14:22:50,042 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=315624.0, ans=0.125 2024-10-08 14:22:59,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=315624.0, ans=0.0 2024-10-08 14:23:09,233 INFO [train.py:1136] (0/2) Epoch 33, batch 200, loss[loss=0.1859, simple_loss=0.287, pruned_loss=0.04243, over 86217.00 frames. ], tot_loss[loss=0.1788, simple_loss=0.2788, pruned_loss=0.03938, over 10879184.86 frames. ], batch size: 667, lr: 7.77e-03, grad_scale: 8.0 2024-10-08 14:23:17,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=315744.0, ans=0.125 2024-10-08 14:23:22,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=315744.0, ans=0.125 2024-10-08 14:23:27,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=315864.0, ans=0.0 2024-10-08 14:23:31,459 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2024-10-08 14:24:07,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=316104.0, ans=0.0 2024-10-08 14:24:09,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=316104.0, ans=0.07 2024-10-08 14:24:13,003 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=316104.0, ans=0.1 2024-10-08 14:24:41,547 INFO [train.py:1136] (0/2) Epoch 33, batch 250, loss[loss=0.1695, simple_loss=0.2749, pruned_loss=0.03205, over 87337.00 frames. ], tot_loss[loss=0.1784, simple_loss=0.2784, pruned_loss=0.03919, over 12284381.93 frames. ], batch size: 490, lr: 7.76e-03, grad_scale: 8.0 2024-10-08 14:24:41,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=316344.0, ans=0.125 2024-10-08 14:25:01,816 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-10-08 14:25:03,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=316464.0, ans=0.0 2024-10-08 14:25:03,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=316464.0, ans=0.0 2024-10-08 14:25:25,544 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.66 vs. limit=10.0 2024-10-08 14:25:44,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=316704.0, ans=0.015 2024-10-08 14:25:50,999 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.607e+02 3.918e+02 4.292e+02 5.040e+02 7.304e+02, threshold=8.583e+02, percent-clipped=0.0 2024-10-08 14:26:17,429 INFO [train.py:1136] (0/2) Epoch 33, batch 300, loss[loss=0.1909, simple_loss=0.2936, pruned_loss=0.04408, over 85411.00 frames. ], tot_loss[loss=0.1786, simple_loss=0.2786, pruned_loss=0.0393, over 13355877.93 frames. ], batch size: 786, lr: 7.76e-03, grad_scale: 8.0 2024-10-08 14:26:26,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=316944.0, ans=0.125 2024-10-08 14:26:44,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=317064.0, ans=0.0 2024-10-08 14:26:48,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=317064.0, ans=0.1 2024-10-08 14:26:59,444 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=317184.0, ans=0.125 2024-10-08 14:27:52,754 INFO [train.py:1136] (0/2) Epoch 33, batch 350, loss[loss=0.1887, simple_loss=0.2925, pruned_loss=0.04243, over 85912.00 frames. ], tot_loss[loss=0.1787, simple_loss=0.2784, pruned_loss=0.03954, over 14176091.96 frames. ], batch size: 721, lr: 7.75e-03, grad_scale: 8.0 2024-10-08 14:27:58,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=317544.0, ans=0.125 2024-10-08 14:28:11,174 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-10-08 14:28:16,752 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=22.5 2024-10-08 14:28:27,432 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=12.0 2024-10-08 14:28:43,246 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=317784.0, ans=0.0 2024-10-08 14:28:59,896 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.583e+02 3.988e+02 4.521e+02 5.313e+02 9.650e+02, threshold=9.042e+02, percent-clipped=1.0 2024-10-08 14:29:21,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318024.0, ans=0.1 2024-10-08 14:29:25,873 INFO [train.py:1136] (0/2) Epoch 33, batch 400, loss[loss=0.1896, simple_loss=0.292, pruned_loss=0.04357, over 85374.00 frames. ], tot_loss[loss=0.1791, simple_loss=0.279, pruned_loss=0.0396, over 14822313.04 frames. ], batch size: 787, lr: 7.74e-03, grad_scale: 16.0 2024-10-08 14:29:35,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=318144.0, ans=0.125 2024-10-08 14:29:39,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318144.0, ans=0.1 2024-10-08 14:29:45,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=318264.0, ans=0.125 2024-10-08 14:29:49,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=318264.0, ans=0.0 2024-10-08 14:30:19,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=318384.0, ans=0.125 2024-10-08 14:30:48,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=318624.0, ans=0.0 2024-10-08 14:30:54,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=318624.0, ans=0.125 2024-10-08 14:30:58,927 INFO [train.py:1136] (0/2) Epoch 33, batch 450, loss[loss=0.2199, simple_loss=0.3111, pruned_loss=0.06439, over 78356.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.279, pruned_loss=0.03939, over 15347803.43 frames. ], batch size: 1493, lr: 7.73e-03, grad_scale: 16.0 2024-10-08 14:31:22,599 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-10-08 14:31:47,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=318984.0, ans=0.035 2024-10-08 14:32:04,183 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-10-08 14:32:04,405 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.389e+02 4.056e+02 4.380e+02 5.011e+02 7.431e+02, threshold=8.760e+02, percent-clipped=0.0 2024-10-08 14:32:10,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=319104.0, ans=0.05 2024-10-08 14:32:10,843 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 14:32:15,891 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=319224.0, ans=0.0 2024-10-08 14:32:27,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=319224.0, ans=0.5 2024-10-08 14:32:33,497 INFO [train.py:1136] (0/2) Epoch 33, batch 500, loss[loss=0.2025, simple_loss=0.2967, pruned_loss=0.05414, over 69275.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.2788, pruned_loss=0.03949, over 15738755.64 frames. ], batch size: 1960, lr: 7.73e-03, grad_scale: 16.0 2024-10-08 14:32:46,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319344.0, ans=0.1 2024-10-08 14:33:29,079 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-10-08 14:33:37,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=319704.0, ans=0.125 2024-10-08 14:33:42,922 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.62 vs. limit=22.5 2024-10-08 14:34:08,790 INFO [train.py:1136] (0/2) Epoch 33, batch 550, loss[loss=0.1856, simple_loss=0.286, pruned_loss=0.04262, over 86960.00 frames. ], tot_loss[loss=0.179, simple_loss=0.279, pruned_loss=0.03953, over 16057381.53 frames. ], batch size: 583, lr: 7.72e-03, grad_scale: 16.0 2024-10-08 14:34:34,183 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-10-08 14:34:35,809 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2024-10-08 14:34:48,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=320184.0, ans=0.2 2024-10-08 14:35:15,880 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.525e+02 4.125e+02 4.641e+02 5.455e+02 8.266e+02, threshold=9.283e+02, percent-clipped=0.0 2024-10-08 14:35:18,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=320304.0, ans=0.0 2024-10-08 14:35:21,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=320424.0, ans=0.0 2024-10-08 14:35:28,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=22.5 2024-10-08 14:35:29,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=320424.0, ans=0.05 2024-10-08 14:35:33,452 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.41 vs. limit=10.0 2024-10-08 14:35:44,883 INFO [train.py:1136] (0/2) Epoch 33, batch 600, loss[loss=0.2173, simple_loss=0.3139, pruned_loss=0.0603, over 78455.00 frames. ], tot_loss[loss=0.1794, simple_loss=0.2795, pruned_loss=0.03961, over 16269347.50 frames. ], batch size: 1493, lr: 7.71e-03, grad_scale: 16.0 2024-10-08 14:35:49,200 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=320544.0, ans=0.1 2024-10-08 14:35:58,640 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=320544.0, ans=0.125 2024-10-08 14:36:43,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=320904.0, ans=0.125 2024-10-08 14:37:16,884 INFO [train.py:1136] (0/2) Epoch 33, batch 650, loss[loss=0.1682, simple_loss=0.2685, pruned_loss=0.03393, over 86672.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.2789, pruned_loss=0.03946, over 16454551.37 frames. ], batch size: 246, lr: 7.71e-03, grad_scale: 16.0 2024-10-08 14:37:17,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=321144.0, ans=0.5 2024-10-08 14:37:39,500 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.63 vs. limit=15.0 2024-10-08 14:37:40,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=321264.0, ans=0.125 2024-10-08 14:37:48,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-10-08 14:38:03,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=321384.0, ans=0.2 2024-10-08 14:38:05,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=321384.0, ans=0.125 2024-10-08 14:38:16,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=321504.0, ans=0.125 2024-10-08 14:38:20,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=321504.0, ans=0.125 2024-10-08 14:38:26,293 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=321504.0, ans=0.2 2024-10-08 14:38:26,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=321504.0, ans=0.0 2024-10-08 14:38:27,554 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.393e+02 3.988e+02 4.376e+02 4.932e+02 8.234e+02, threshold=8.753e+02, percent-clipped=0.0 2024-10-08 14:38:34,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=321624.0, ans=0.0 2024-10-08 14:38:34,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=321624.0, ans=0.0 2024-10-08 14:38:50,007 INFO [train.py:1136] (0/2) Epoch 33, batch 700, loss[loss=0.1704, simple_loss=0.2735, pruned_loss=0.03369, over 87326.00 frames. ], tot_loss[loss=0.1786, simple_loss=0.2787, pruned_loss=0.0393, over 16627267.45 frames. ], batch size: 464, lr: 7.70e-03, grad_scale: 16.0 2024-10-08 14:38:56,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=321744.0, ans=0.125 2024-10-08 14:39:14,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=321864.0, ans=0.125 2024-10-08 14:39:16,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=321864.0, ans=0.125 2024-10-08 14:39:29,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=321984.0, ans=0.125 2024-10-08 14:39:47,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=322104.0, ans=0.0 2024-10-08 14:39:59,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=322224.0, ans=0.0 2024-10-08 14:40:10,179 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=22.5 2024-10-08 14:40:12,069 INFO [train.py:1136] (0/2) Epoch 33, batch 750, loss[loss=0.1822, simple_loss=0.2876, pruned_loss=0.03841, over 86038.00 frames. ], tot_loss[loss=0.1791, simple_loss=0.2794, pruned_loss=0.03938, over 16687898.13 frames. ], batch size: 721, lr: 7.69e-03, grad_scale: 8.0 2024-10-08 14:40:15,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=322344.0, ans=0.04949747468305833 2024-10-08 14:40:24,017 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=12.0 2024-10-08 14:40:43,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=322584.0, ans=0.1 2024-10-08 14:41:13,741 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.559e+02 4.237e+02 4.504e+02 5.191e+02 2.182e+03, threshold=9.008e+02, percent-clipped=1.0 2024-10-08 14:41:19,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=322824.0, ans=0.125 2024-10-08 14:41:27,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=15.0 2024-10-08 14:41:36,317 INFO [train.py:1136] (0/2) Epoch 33, batch 800, loss[loss=0.2155, simple_loss=0.3101, pruned_loss=0.06048, over 78729.00 frames. ], tot_loss[loss=0.1798, simple_loss=0.2802, pruned_loss=0.03973, over 16719365.55 frames. ], batch size: 1493, lr: 7.69e-03, grad_scale: 16.0 2024-10-08 14:41:47,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=322944.0, ans=0.125 2024-10-08 14:42:00,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=323064.0, ans=0.0 2024-10-08 14:42:02,960 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-33.pt 2024-10-08 14:43:01,709 INFO [train.py:1136] (0/2) Epoch 34, batch 0, loss[loss=0.1781, simple_loss=0.283, pruned_loss=0.03661, over 87455.00 frames. ], tot_loss[loss=0.1781, simple_loss=0.283, pruned_loss=0.03661, over 87455.00 frames. ], batch size: 372, lr: 7.57e-03, grad_scale: 32.0 2024-10-08 14:43:01,710 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 14:43:12,636 INFO [train.py:1168] (0/2) Epoch 34, validation: loss=0.1677, simple_loss=0.2781, pruned_loss=0.02862, over 1382211.00 frames. 2024-10-08 14:43:12,637 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 14:43:20,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=323136.0, ans=0.0 2024-10-08 14:43:20,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=323136.0, ans=0.125 2024-10-08 14:43:49,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=323376.0, ans=0.04949747468305833 2024-10-08 14:44:23,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=323496.0, ans=0.125 2024-10-08 14:44:48,079 INFO [train.py:1136] (0/2) Epoch 34, batch 50, loss[loss=0.18, simple_loss=0.2764, pruned_loss=0.04176, over 87238.00 frames. ], tot_loss[loss=0.1793, simple_loss=0.2797, pruned_loss=0.03941, over 3885290.08 frames. ], batch size: 296, lr: 7.56e-03, grad_scale: 32.0 2024-10-08 14:44:48,868 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 14:44:59,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=323736.0, ans=0.125 2024-10-08 14:45:23,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=323976.0, ans=0.2 2024-10-08 14:45:26,609 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.528e+02 4.018e+02 4.693e+02 5.276e+02 9.473e+02, threshold=9.386e+02, percent-clipped=1.0 2024-10-08 14:45:34,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=323976.0, ans=0.125 2024-10-08 14:45:35,839 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=323976.0, ans=0.125 2024-10-08 14:45:50,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=324096.0, ans=0.0 2024-10-08 14:45:55,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=324096.0, ans=0.125 2024-10-08 14:45:57,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=324096.0, ans=0.1 2024-10-08 14:46:19,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=324216.0, ans=0.0 2024-10-08 14:46:21,967 INFO [train.py:1136] (0/2) Epoch 34, batch 100, loss[loss=0.1856, simple_loss=0.2862, pruned_loss=0.04252, over 86906.00 frames. ], tot_loss[loss=0.1787, simple_loss=0.2786, pruned_loss=0.03938, over 6793489.76 frames. ], batch size: 583, lr: 7.56e-03, grad_scale: 32.0 2024-10-08 14:46:44,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=324456.0, ans=0.0 2024-10-08 14:46:48,321 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.68 vs. limit=22.5 2024-10-08 14:46:48,347 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.04 vs. limit=15.0 2024-10-08 14:47:22,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=324696.0, ans=0.125 2024-10-08 14:47:41,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324816.0, ans=0.1 2024-10-08 14:47:43,528 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324816.0, ans=0.1 2024-10-08 14:47:58,434 INFO [train.py:1136] (0/2) Epoch 34, batch 150, loss[loss=0.1711, simple_loss=0.2772, pruned_loss=0.03244, over 87330.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.2793, pruned_loss=0.0392, over 9088273.57 frames. ], batch size: 490, lr: 7.55e-03, grad_scale: 32.0 2024-10-08 14:48:06,960 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2024-10-08 14:48:16,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=325056.0, ans=0.125 2024-10-08 14:48:20,690 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=22.5 2024-10-08 14:48:35,169 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 14:48:39,784 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.572e+02 4.241e+02 4.958e+02 5.759e+02 9.036e+02, threshold=9.916e+02, percent-clipped=0.0 2024-10-08 14:48:40,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=325176.0, ans=0.125 2024-10-08 14:49:11,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=325296.0, ans=0.2 2024-10-08 14:49:15,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=325416.0, ans=0.0 2024-10-08 14:49:32,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=325536.0, ans=0.05 2024-10-08 14:49:33,774 INFO [train.py:1136] (0/2) Epoch 34, batch 200, loss[loss=0.1637, simple_loss=0.2585, pruned_loss=0.03441, over 86224.00 frames. ], tot_loss[loss=0.1779, simple_loss=0.278, pruned_loss=0.03892, over 10860258.20 frames. ], batch size: 197, lr: 7.54e-03, grad_scale: 32.0 2024-10-08 14:49:37,606 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=325536.0, ans=0.2 2024-10-08 14:49:51,202 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=325656.0, ans=0.125 2024-10-08 14:50:02,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=325656.0, ans=0.2 2024-10-08 14:50:29,169 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2024-10-08 14:50:55,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=326016.0, ans=0.125 2024-10-08 14:50:55,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=326016.0, ans=0.125 2024-10-08 14:51:07,263 INFO [train.py:1136] (0/2) Epoch 34, batch 250, loss[loss=0.172, simple_loss=0.2681, pruned_loss=0.03795, over 87331.00 frames. ], tot_loss[loss=0.1788, simple_loss=0.2789, pruned_loss=0.03931, over 12201164.43 frames. ], batch size: 280, lr: 7.54e-03, grad_scale: 16.0 2024-10-08 14:51:19,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.79 vs. limit=15.0 2024-10-08 14:51:20,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=326136.0, ans=0.0 2024-10-08 14:51:34,462 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.51 vs. limit=15.0 2024-10-08 14:51:49,617 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.462e+02 4.123e+02 4.482e+02 5.228e+02 7.642e+02, threshold=8.964e+02, percent-clipped=0.0 2024-10-08 14:51:57,122 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=22.5 2024-10-08 14:52:00,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=326376.0, ans=0.125 2024-10-08 14:52:42,134 INFO [train.py:1136] (0/2) Epoch 34, batch 300, loss[loss=0.1703, simple_loss=0.2719, pruned_loss=0.03432, over 87048.00 frames. ], tot_loss[loss=0.1788, simple_loss=0.279, pruned_loss=0.03932, over 13273092.41 frames. ], batch size: 350, lr: 7.53e-03, grad_scale: 16.0 2024-10-08 14:53:30,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=326976.0, ans=0.125 2024-10-08 14:53:46,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.11 vs. limit=10.0 2024-10-08 14:54:06,197 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=12.0 2024-10-08 14:54:17,763 INFO [train.py:1136] (0/2) Epoch 34, batch 350, loss[loss=0.17, simple_loss=0.2682, pruned_loss=0.03595, over 86659.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.2791, pruned_loss=0.03937, over 14127606.97 frames. ], batch size: 229, lr: 7.52e-03, grad_scale: 16.0 2024-10-08 14:54:39,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=327456.0, ans=0.0 2024-10-08 14:54:56,859 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.466e+02 4.126e+02 4.608e+02 5.559e+02 7.843e+02, threshold=9.215e+02, percent-clipped=0.0 2024-10-08 14:55:10,836 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327696.0, ans=0.1 2024-10-08 14:55:50,874 INFO [train.py:1136] (0/2) Epoch 34, batch 400, loss[loss=0.1869, simple_loss=0.2905, pruned_loss=0.04163, over 85657.00 frames. ], tot_loss[loss=0.1784, simple_loss=0.2787, pruned_loss=0.03905, over 14794455.69 frames. ], batch size: 787, lr: 7.52e-03, grad_scale: 32.0 2024-10-08 14:55:51,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=327936.0, ans=0.2 2024-10-08 14:56:07,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=327936.0, ans=0.125 2024-10-08 14:56:12,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=328056.0, ans=0.0 2024-10-08 14:56:13,188 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-10-08 14:56:14,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328056.0, ans=0.1 2024-10-08 14:56:33,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=328176.0, ans=0.0 2024-10-08 14:56:56,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=328296.0, ans=0.125 2024-10-08 14:57:27,055 INFO [train.py:1136] (0/2) Epoch 34, batch 450, loss[loss=0.1641, simple_loss=0.2693, pruned_loss=0.02952, over 87364.00 frames. ], tot_loss[loss=0.178, simple_loss=0.2782, pruned_loss=0.03892, over 15290201.92 frames. ], batch size: 439, lr: 7.51e-03, grad_scale: 32.0 2024-10-08 14:57:29,417 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=328536.0, ans=0.1 2024-10-08 14:57:48,665 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=328656.0, ans=0.125 2024-10-08 14:58:07,362 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.333e+02 4.308e+02 4.653e+02 5.532e+02 8.953e+02, threshold=9.305e+02, percent-clipped=0.0 2024-10-08 14:58:08,496 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=15.0 2024-10-08 14:58:18,674 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 14:58:20,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=328776.0, ans=0.0 2024-10-08 14:58:45,837 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=329016.0, ans=0.125 2024-10-08 14:59:00,691 INFO [train.py:1136] (0/2) Epoch 34, batch 500, loss[loss=0.1861, simple_loss=0.2866, pruned_loss=0.04282, over 87089.00 frames. ], tot_loss[loss=0.1782, simple_loss=0.2785, pruned_loss=0.039, over 15662614.08 frames. ], batch size: 583, lr: 7.51e-03, grad_scale: 32.0 2024-10-08 14:59:02,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=329136.0, ans=0.125 2024-10-08 14:59:08,655 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 14:59:17,881 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-10-08 14:59:19,583 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2024-10-08 14:59:54,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=329376.0, ans=0.125 2024-10-08 15:00:36,639 INFO [train.py:1136] (0/2) Epoch 34, batch 550, loss[loss=0.1915, simple_loss=0.2942, pruned_loss=0.04435, over 85306.00 frames. ], tot_loss[loss=0.178, simple_loss=0.2783, pruned_loss=0.03884, over 15990857.75 frames. ], batch size: 866, lr: 7.50e-03, grad_scale: 16.0 2024-10-08 15:00:40,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=329736.0, ans=0.125 2024-10-08 15:00:46,454 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=329736.0, ans=0.0 2024-10-08 15:01:00,624 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=329856.0, ans=6.0 2024-10-08 15:01:05,703 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2024-10-08 15:01:12,267 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=329976.0, ans=0.2 2024-10-08 15:01:19,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=329976.0, ans=0.125 2024-10-08 15:01:21,148 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.538e+02 4.167e+02 4.625e+02 5.434e+02 7.269e+02, threshold=9.250e+02, percent-clipped=0.0 2024-10-08 15:01:21,692 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=329976.0, ans=0.125 2024-10-08 15:01:23,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-10-08 15:01:41,606 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-10-08 15:01:56,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2024-10-08 15:02:13,034 INFO [train.py:1136] (0/2) Epoch 34, batch 600, loss[loss=0.1692, simple_loss=0.2644, pruned_loss=0.03703, over 86577.00 frames. ], tot_loss[loss=0.1783, simple_loss=0.2784, pruned_loss=0.03909, over 16225916.74 frames. ], batch size: 246, lr: 7.49e-03, grad_scale: 16.0 2024-10-08 15:02:44,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=330456.0, ans=0.2 2024-10-08 15:03:16,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=330696.0, ans=0.125 2024-10-08 15:03:46,490 INFO [train.py:1136] (0/2) Epoch 34, batch 650, loss[loss=0.1873, simple_loss=0.2852, pruned_loss=0.04473, over 86981.00 frames. ], tot_loss[loss=0.1777, simple_loss=0.2778, pruned_loss=0.03874, over 16457321.44 frames. ], batch size: 583, lr: 7.49e-03, grad_scale: 16.0 2024-10-08 15:03:48,905 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 15:03:57,284 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=330936.0, ans=0.09899494936611666 2024-10-08 15:04:04,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=331056.0, ans=0.025 2024-10-08 15:04:21,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=331056.0, ans=0.2 2024-10-08 15:04:29,021 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.678e+02 4.032e+02 4.517e+02 5.059e+02 6.826e+02, threshold=9.034e+02, percent-clipped=0.0 2024-10-08 15:04:41,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=331296.0, ans=0.025 2024-10-08 15:04:55,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=331416.0, ans=0.05 2024-10-08 15:05:09,201 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.57 vs. limit=15.0 2024-10-08 15:05:10,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=331416.0, ans=12.0 2024-10-08 15:05:12,833 INFO [train.py:1136] (0/2) Epoch 34, batch 700, loss[loss=0.1723, simple_loss=0.2685, pruned_loss=0.03805, over 86711.00 frames. ], tot_loss[loss=0.1777, simple_loss=0.2777, pruned_loss=0.03882, over 16605266.93 frames. ], batch size: 246, lr: 7.48e-03, grad_scale: 16.0 2024-10-08 15:05:16,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=331536.0, ans=0.015 2024-10-08 15:05:18,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=331536.0, ans=0.125 2024-10-08 15:05:31,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-10-08 15:05:40,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331656.0, ans=0.1 2024-10-08 15:06:03,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=331896.0, ans=0.0 2024-10-08 15:06:06,938 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331896.0, ans=0.1 2024-10-08 15:06:25,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.72 vs. limit=15.0 2024-10-08 15:06:36,714 INFO [train.py:1136] (0/2) Epoch 34, batch 750, loss[loss=0.1662, simple_loss=0.2671, pruned_loss=0.03266, over 87114.00 frames. ], tot_loss[loss=0.1781, simple_loss=0.278, pruned_loss=0.03904, over 16705088.66 frames. ], batch size: 350, lr: 7.47e-03, grad_scale: 16.0 2024-10-08 15:06:43,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=332136.0, ans=0.2 2024-10-08 15:07:14,522 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.488e+02 4.011e+02 4.508e+02 5.417e+02 1.713e+03, threshold=9.016e+02, percent-clipped=1.0 2024-10-08 15:07:18,680 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.24 vs. limit=10.0 2024-10-08 15:07:21,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=332376.0, ans=0.0 2024-10-08 15:07:37,928 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=332496.0, ans=0.2 2024-10-08 15:07:40,996 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 15:07:59,591 INFO [train.py:1136] (0/2) Epoch 34, batch 800, loss[loss=0.2009, simple_loss=0.2952, pruned_loss=0.05335, over 69040.00 frames. ], tot_loss[loss=0.1786, simple_loss=0.2788, pruned_loss=0.03922, over 16748226.86 frames. ], batch size: 1960, lr: 7.47e-03, grad_scale: 32.0 2024-10-08 15:08:08,226 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.72 vs. limit=22.5 2024-10-08 15:08:09,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=332736.0, ans=0.2 2024-10-08 15:08:24,289 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-34.pt 2024-10-08 15:09:35,041 INFO [train.py:1136] (0/2) Epoch 35, batch 0, loss[loss=0.1719, simple_loss=0.2753, pruned_loss=0.03425, over 87338.00 frames. ], tot_loss[loss=0.1719, simple_loss=0.2753, pruned_loss=0.03425, over 87338.00 frames. ], batch size: 415, lr: 7.36e-03, grad_scale: 32.0 2024-10-08 15:09:35,043 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 15:09:47,950 INFO [train.py:1168] (0/2) Epoch 35, validation: loss=0.1676, simple_loss=0.2773, pruned_loss=0.02895, over 1382211.00 frames. 2024-10-08 15:09:47,951 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 15:10:29,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=333168.0, ans=0.0 2024-10-08 15:10:39,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=333288.0, ans=0.0 2024-10-08 15:10:47,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=333288.0, ans=0.125 2024-10-08 15:11:20,572 INFO [train.py:1136] (0/2) Epoch 35, batch 50, loss[loss=0.1698, simple_loss=0.2666, pruned_loss=0.03644, over 86508.00 frames. ], tot_loss[loss=0.1764, simple_loss=0.2761, pruned_loss=0.03837, over 3894611.47 frames. ], batch size: 229, lr: 7.35e-03, grad_scale: 32.0 2024-10-08 15:11:32,937 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.423e+02 4.163e+02 4.853e+02 5.400e+02 8.035e+02, threshold=9.706e+02, percent-clipped=0.0 2024-10-08 15:11:43,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=333648.0, ans=0.125 2024-10-08 15:12:14,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=333768.0, ans=0.125 2024-10-08 15:12:47,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=334008.0, ans=0.0 2024-10-08 15:12:56,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=334128.0, ans=0.0 2024-10-08 15:12:57,605 INFO [train.py:1136] (0/2) Epoch 35, batch 100, loss[loss=0.1665, simple_loss=0.2707, pruned_loss=0.03115, over 87337.00 frames. ], tot_loss[loss=0.1778, simple_loss=0.2777, pruned_loss=0.03899, over 6809112.11 frames. ], batch size: 464, lr: 7.34e-03, grad_scale: 32.0 2024-10-08 15:13:10,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=334128.0, ans=0.125 2024-10-08 15:13:28,879 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.36 vs. limit=15.0 2024-10-08 15:14:27,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2024-10-08 15:14:33,182 INFO [train.py:1136] (0/2) Epoch 35, batch 150, loss[loss=0.1682, simple_loss=0.2665, pruned_loss=0.03491, over 87385.00 frames. ], tot_loss[loss=0.1782, simple_loss=0.2779, pruned_loss=0.03919, over 9048173.75 frames. ], batch size: 264, lr: 7.34e-03, grad_scale: 32.0 2024-10-08 15:14:35,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=334728.0, ans=0.2 2024-10-08 15:14:44,175 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.75 vs. limit=6.0 2024-10-08 15:14:45,113 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.583e+02 4.075e+02 4.475e+02 4.990e+02 6.525e+02, threshold=8.950e+02, percent-clipped=0.0 2024-10-08 15:14:47,462 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=334728.0, ans=0.125 2024-10-08 15:14:50,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=334848.0, ans=0.0 2024-10-08 15:14:52,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.74 vs. limit=10.0 2024-10-08 15:15:07,076 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=334848.0, ans=0.0 2024-10-08 15:15:24,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=334968.0, ans=0.0 2024-10-08 15:15:26,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=334968.0, ans=0.0 2024-10-08 15:15:58,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=335208.0, ans=0.0 2024-10-08 15:16:06,368 INFO [train.py:1136] (0/2) Epoch 35, batch 200, loss[loss=0.1812, simple_loss=0.2875, pruned_loss=0.03743, over 84378.00 frames. ], tot_loss[loss=0.1781, simple_loss=0.278, pruned_loss=0.03915, over 10821390.16 frames. ], batch size: 958, lr: 7.33e-03, grad_scale: 16.0 2024-10-08 15:16:06,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=335328.0, ans=0.125 2024-10-08 15:16:21,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=335328.0, ans=0.025 2024-10-08 15:16:22,843 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=335328.0, ans=0.0 2024-10-08 15:16:35,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=335448.0, ans=0.125 2024-10-08 15:17:21,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=335808.0, ans=0.125 2024-10-08 15:17:21,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=335808.0, ans=0.125 2024-10-08 15:17:40,078 INFO [train.py:1136] (0/2) Epoch 35, batch 250, loss[loss=0.1703, simple_loss=0.2667, pruned_loss=0.03696, over 87268.00 frames. ], tot_loss[loss=0.1778, simple_loss=0.2778, pruned_loss=0.03888, over 12236967.87 frames. ], batch size: 280, lr: 7.33e-03, grad_scale: 16.0 2024-10-08 15:17:52,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=335928.0, ans=0.125 2024-10-08 15:17:53,567 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-28000.pt 2024-10-08 15:18:02,149 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.563e+02 4.252e+02 4.945e+02 5.642e+02 8.243e+02, threshold=9.889e+02, percent-clipped=0.0 2024-10-08 15:19:15,741 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.10 vs. limit=15.0 2024-10-08 15:19:19,840 INFO [train.py:1136] (0/2) Epoch 35, batch 300, loss[loss=0.1865, simple_loss=0.2881, pruned_loss=0.04245, over 86416.00 frames. ], tot_loss[loss=0.1773, simple_loss=0.2774, pruned_loss=0.03866, over 13328921.83 frames. ], batch size: 668, lr: 7.32e-03, grad_scale: 16.0 2024-10-08 15:19:42,381 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.06 vs. limit=15.0 2024-10-08 15:19:53,210 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-10-08 15:19:59,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=336768.0, ans=0.0 2024-10-08 15:20:20,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336888.0, ans=0.1 2024-10-08 15:20:41,193 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337008.0, ans=0.1 2024-10-08 15:20:47,818 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=337008.0, ans=0.125 2024-10-08 15:20:52,350 INFO [train.py:1136] (0/2) Epoch 35, batch 350, loss[loss=0.1632, simple_loss=0.2623, pruned_loss=0.03209, over 87052.00 frames. ], tot_loss[loss=0.177, simple_loss=0.277, pruned_loss=0.03852, over 14172901.58 frames. ], batch size: 330, lr: 7.31e-03, grad_scale: 16.0 2024-10-08 15:20:53,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-10-08 15:21:06,497 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.491e+02 3.962e+02 4.518e+02 5.168e+02 7.333e+02, threshold=9.036e+02, percent-clipped=0.0 2024-10-08 15:21:23,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=22.5 2024-10-08 15:21:59,253 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-10-08 15:22:20,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=337608.0, ans=0.1 2024-10-08 15:22:24,016 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 15:22:25,160 INFO [train.py:1136] (0/2) Epoch 35, batch 400, loss[loss=0.1684, simple_loss=0.2675, pruned_loss=0.03465, over 87137.00 frames. ], tot_loss[loss=0.178, simple_loss=0.2778, pruned_loss=0.0391, over 14781086.08 frames. ], batch size: 350, lr: 7.31e-03, grad_scale: 32.0 2024-10-08 15:22:41,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=337848.0, ans=0.0 2024-10-08 15:23:37,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=338208.0, ans=10.0 2024-10-08 15:23:39,406 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=338208.0, ans=0.125 2024-10-08 15:23:59,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=338328.0, ans=0.0 2024-10-08 15:24:00,606 INFO [train.py:1136] (0/2) Epoch 35, batch 450, loss[loss=0.1659, simple_loss=0.2701, pruned_loss=0.03082, over 87389.00 frames. ], tot_loss[loss=0.1778, simple_loss=0.2777, pruned_loss=0.03897, over 15312940.72 frames. ], batch size: 393, lr: 7.30e-03, grad_scale: 32.0 2024-10-08 15:24:14,846 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.550e+02 4.091e+02 4.535e+02 5.096e+02 6.275e+02, threshold=9.070e+02, percent-clipped=0.0 2024-10-08 15:24:20,852 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2024-10-08 15:25:06,400 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338688.0, ans=0.1 2024-10-08 15:25:19,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=338808.0, ans=0.125 2024-10-08 15:25:34,507 INFO [train.py:1136] (0/2) Epoch 35, batch 500, loss[loss=0.1658, simple_loss=0.2707, pruned_loss=0.03038, over 87257.00 frames. ], tot_loss[loss=0.1779, simple_loss=0.2777, pruned_loss=0.03902, over 15696816.70 frames. ], batch size: 439, lr: 7.29e-03, grad_scale: 32.0 2024-10-08 15:25:52,145 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-10-08 15:26:06,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=15.0 2024-10-08 15:26:16,683 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=339168.0, ans=0.015 2024-10-08 15:26:27,791 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=339168.0, ans=0.125 2024-10-08 15:26:53,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=339408.0, ans=0.1 2024-10-08 15:27:11,029 INFO [train.py:1136] (0/2) Epoch 35, batch 550, loss[loss=0.189, simple_loss=0.2867, pruned_loss=0.04569, over 87034.00 frames. ], tot_loss[loss=0.1778, simple_loss=0.2776, pruned_loss=0.03901, over 16019593.33 frames. ], batch size: 583, lr: 7.29e-03, grad_scale: 32.0 2024-10-08 15:27:11,676 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=339528.0, ans=0.125 2024-10-08 15:27:24,357 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.586e+02 4.060e+02 4.422e+02 4.994e+02 7.991e+02, threshold=8.845e+02, percent-clipped=0.0 2024-10-08 15:27:58,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=339768.0, ans=0.125 2024-10-08 15:27:58,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=339768.0, ans=0.0 2024-10-08 15:28:05,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=339888.0, ans=0.125 2024-10-08 15:28:07,393 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=339888.0, ans=0.025 2024-10-08 15:28:17,553 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339888.0, ans=0.125 2024-10-08 15:28:19,957 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.31 vs. limit=10.0 2024-10-08 15:28:44,023 INFO [train.py:1136] (0/2) Epoch 35, batch 600, loss[loss=0.1811, simple_loss=0.2798, pruned_loss=0.04113, over 86817.00 frames. ], tot_loss[loss=0.1769, simple_loss=0.2766, pruned_loss=0.03859, over 16293556.17 frames. ], batch size: 547, lr: 7.28e-03, grad_scale: 32.0 2024-10-08 15:29:46,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=340488.0, ans=0.09899494936611666 2024-10-08 15:29:47,218 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-10-08 15:29:58,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=340608.0, ans=0.025 2024-10-08 15:30:06,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=340608.0, ans=0.2 2024-10-08 15:30:07,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340608.0, ans=0.1 2024-10-08 15:30:19,415 INFO [train.py:1136] (0/2) Epoch 35, batch 650, loss[loss=0.1631, simple_loss=0.267, pruned_loss=0.02959, over 87300.00 frames. ], tot_loss[loss=0.1767, simple_loss=0.2766, pruned_loss=0.03838, over 16470475.15 frames. ], batch size: 393, lr: 7.28e-03, grad_scale: 32.0 2024-10-08 15:30:21,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340728.0, ans=0.1 2024-10-08 15:30:30,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=340728.0, ans=0.125 2024-10-08 15:30:32,642 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.70 vs. limit=15.0 2024-10-08 15:30:33,404 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.528e+02 4.084e+02 4.655e+02 5.554e+02 9.310e+02, threshold=9.310e+02, percent-clipped=2.0 2024-10-08 15:30:33,744 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=340728.0, ans=0.05 2024-10-08 15:30:47,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=340848.0, ans=0.07 2024-10-08 15:30:59,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=340968.0, ans=0.125 2024-10-08 15:31:18,034 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=341088.0, ans=0.5 2024-10-08 15:31:21,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=341088.0, ans=0.0 2024-10-08 15:31:25,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=341208.0, ans=0.125 2024-10-08 15:31:42,434 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.32 vs. limit=12.0 2024-10-08 15:31:44,785 INFO [train.py:1136] (0/2) Epoch 35, batch 700, loss[loss=0.2079, simple_loss=0.3068, pruned_loss=0.05449, over 78585.00 frames. ], tot_loss[loss=0.1767, simple_loss=0.2767, pruned_loss=0.03832, over 16609472.40 frames. ], batch size: 1493, lr: 7.27e-03, grad_scale: 32.0 2024-10-08 15:31:57,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=341328.0, ans=0.125 2024-10-08 15:32:36,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=341688.0, ans=0.125 2024-10-08 15:33:03,071 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=341808.0, ans=0.09899494936611666 2024-10-08 15:33:07,525 INFO [train.py:1136] (0/2) Epoch 35, batch 750, loss[loss=0.2002, simple_loss=0.2967, pruned_loss=0.05178, over 69476.00 frames. ], tot_loss[loss=0.1771, simple_loss=0.2773, pruned_loss=0.03843, over 16695101.29 frames. ], batch size: 1960, lr: 7.27e-03, grad_scale: 32.0 2024-10-08 15:33:21,176 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.321e+02 3.964e+02 4.431e+02 4.934e+02 6.725e+02, threshold=8.861e+02, percent-clipped=0.0 2024-10-08 15:33:36,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=22.5 2024-10-08 15:33:36,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=342048.0, ans=0.2 2024-10-08 15:33:53,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=342168.0, ans=0.125 2024-10-08 15:34:30,746 INFO [train.py:1136] (0/2) Epoch 35, batch 800, loss[loss=0.1712, simple_loss=0.2767, pruned_loss=0.03286, over 87347.00 frames. ], tot_loss[loss=0.1773, simple_loss=0.2777, pruned_loss=0.03844, over 16753665.16 frames. ], batch size: 464, lr: 7.26e-03, grad_scale: 32.0 2024-10-08 15:34:55,890 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-35.pt 2024-10-08 15:35:46,906 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.79 vs. limit=15.0 2024-10-08 15:35:47,434 INFO [train.py:1136] (0/2) Epoch 36, batch 0, loss[loss=0.1689, simple_loss=0.2707, pruned_loss=0.03358, over 87416.00 frames. ], tot_loss[loss=0.1689, simple_loss=0.2707, pruned_loss=0.03358, over 87416.00 frames. ], batch size: 393, lr: 7.15e-03, grad_scale: 32.0 2024-10-08 15:35:47,435 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 15:35:53,253 INFO [zipformer.py:1883] (0/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.3257, 5.6572, 6.1477, 5.6637], device='cuda:0') 2024-10-08 15:35:58,564 INFO [train.py:1168] (0/2) Epoch 36, validation: loss=0.1669, simple_loss=0.2763, pruned_loss=0.02874, over 1382211.00 frames. 2024-10-08 15:35:58,565 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 15:36:05,768 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=342720.0, ans=0.125 2024-10-08 15:36:18,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=342840.0, ans=0.025 2024-10-08 15:36:19,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=342840.0, ans=0.125 2024-10-08 15:36:19,983 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=342840.0, ans=0.125 2024-10-08 15:36:20,669 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-10-08 15:36:36,526 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=342960.0, ans=0.0 2024-10-08 15:36:45,507 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 15:36:50,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=342960.0, ans=0.04949747468305833 2024-10-08 15:37:15,282 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-10-08 15:37:19,266 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.710e+02 4.297e+02 4.610e+02 5.151e+02 7.055e+02, threshold=9.221e+02, percent-clipped=0.0 2024-10-08 15:37:28,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-10-08 15:37:32,946 INFO [train.py:1136] (0/2) Epoch 36, batch 50, loss[loss=0.1878, simple_loss=0.2862, pruned_loss=0.04466, over 86837.00 frames. ], tot_loss[loss=0.1789, simple_loss=0.2782, pruned_loss=0.03983, over 3821721.24 frames. ], batch size: 547, lr: 7.15e-03, grad_scale: 32.0 2024-10-08 15:37:40,596 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 15:38:05,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=343440.0, ans=0.1 2024-10-08 15:38:10,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=343560.0, ans=0.125 2024-10-08 15:38:33,807 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 15:38:38,892 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=343680.0, ans=10.0 2024-10-08 15:38:44,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=343680.0, ans=0.125 2024-10-08 15:38:46,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=343680.0, ans=0.125 2024-10-08 15:38:49,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=343800.0, ans=0.125 2024-10-08 15:39:09,113 INFO [train.py:1136] (0/2) Epoch 36, batch 100, loss[loss=0.1679, simple_loss=0.2633, pruned_loss=0.03618, over 86372.00 frames. ], tot_loss[loss=0.1772, simple_loss=0.277, pruned_loss=0.03871, over 6794555.59 frames. ], batch size: 213, lr: 7.14e-03, grad_scale: 32.0 2024-10-08 15:39:24,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2024-10-08 15:39:29,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=344040.0, ans=0.125 2024-10-08 15:39:50,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=344160.0, ans=0.2 2024-10-08 15:40:15,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=344280.0, ans=0.125 2024-10-08 15:40:27,164 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.430e+02 4.011e+02 4.434e+02 5.295e+02 7.235e+02, threshold=8.867e+02, percent-clipped=0.0 2024-10-08 15:40:28,441 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2024-10-08 15:40:39,053 INFO [train.py:1136] (0/2) Epoch 36, batch 150, loss[loss=0.1821, simple_loss=0.2811, pruned_loss=0.04156, over 86820.00 frames. ], tot_loss[loss=0.1763, simple_loss=0.2761, pruned_loss=0.03819, over 9122402.23 frames. ], batch size: 547, lr: 7.14e-03, grad_scale: 32.0 2024-10-08 15:40:39,394 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344520.0, ans=0.1 2024-10-08 15:41:12,218 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=344640.0, ans=0.0 2024-10-08 15:41:58,932 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=345000.0, ans=0.025 2024-10-08 15:42:10,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=345000.0, ans=0.025 2024-10-08 15:42:11,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345000.0, ans=0.1 2024-10-08 15:42:15,665 INFO [train.py:1136] (0/2) Epoch 36, batch 200, loss[loss=0.1847, simple_loss=0.2915, pruned_loss=0.03896, over 83420.00 frames. ], tot_loss[loss=0.1772, simple_loss=0.2774, pruned_loss=0.03849, over 10877787.72 frames. ], batch size: 1077, lr: 7.13e-03, grad_scale: 32.0 2024-10-08 15:42:28,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=345120.0, ans=0.125 2024-10-08 15:43:13,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=345480.0, ans=0.0 2024-10-08 15:43:38,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=345600.0, ans=0.125 2024-10-08 15:43:39,736 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.618e+02 4.141e+02 4.629e+02 5.292e+02 7.292e+02, threshold=9.259e+02, percent-clipped=0.0 2024-10-08 15:43:51,898 INFO [train.py:1136] (0/2) Epoch 36, batch 250, loss[loss=0.1827, simple_loss=0.2887, pruned_loss=0.03839, over 85508.00 frames. ], tot_loss[loss=0.1772, simple_loss=0.2778, pruned_loss=0.03834, over 12258741.18 frames. ], batch size: 786, lr: 7.12e-03, grad_scale: 32.0 2024-10-08 15:44:07,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=345720.0, ans=0.0 2024-10-08 15:45:12,104 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=346200.0, ans=0.0 2024-10-08 15:45:16,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=346200.0, ans=0.2 2024-10-08 15:45:27,971 INFO [train.py:1136] (0/2) Epoch 36, batch 300, loss[loss=0.1675, simple_loss=0.257, pruned_loss=0.03902, over 85828.00 frames. ], tot_loss[loss=0.1764, simple_loss=0.2771, pruned_loss=0.03789, over 13361654.99 frames. ], batch size: 180, lr: 7.12e-03, grad_scale: 32.0 2024-10-08 15:46:46,244 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.425e+02 4.081e+02 4.463e+02 4.949e+02 7.126e+02, threshold=8.925e+02, percent-clipped=0.0 2024-10-08 15:47:02,455 INFO [train.py:1136] (0/2) Epoch 36, batch 350, loss[loss=0.1716, simple_loss=0.2714, pruned_loss=0.03594, over 87331.00 frames. ], tot_loss[loss=0.176, simple_loss=0.2767, pruned_loss=0.03768, over 14226339.41 frames. ], batch size: 393, lr: 7.11e-03, grad_scale: 32.0 2024-10-08 15:47:44,683 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2024-10-08 15:47:55,626 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=347280.0, ans=0.0 2024-10-08 15:48:12,642 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=347400.0, ans=0.125 2024-10-08 15:48:12,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=347400.0, ans=0.125 2024-10-08 15:48:30,798 INFO [train.py:1136] (0/2) Epoch 36, batch 400, loss[loss=0.1751, simple_loss=0.2675, pruned_loss=0.04135, over 87295.00 frames. ], tot_loss[loss=0.1756, simple_loss=0.2764, pruned_loss=0.03747, over 14899066.53 frames. ], batch size: 280, lr: 7.11e-03, grad_scale: 32.0 2024-10-08 15:48:31,314 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=347520.0, ans=0.125 2024-10-08 15:48:45,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=347520.0, ans=0.025 2024-10-08 15:48:47,242 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 15:49:12,478 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2024-10-08 15:49:38,628 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2024-10-08 15:49:55,605 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.462e+02 4.096e+02 4.397e+02 5.086e+02 8.005e+02, threshold=8.794e+02, percent-clipped=0.0 2024-10-08 15:50:05,617 INFO [train.py:1136] (0/2) Epoch 36, batch 450, loss[loss=0.1747, simple_loss=0.273, pruned_loss=0.03821, over 87183.00 frames. ], tot_loss[loss=0.1753, simple_loss=0.2761, pruned_loss=0.03728, over 15422884.41 frames. ], batch size: 313, lr: 7.10e-03, grad_scale: 16.0 2024-10-08 15:50:11,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348120.0, ans=0.1 2024-10-08 15:50:45,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=348360.0, ans=0.0 2024-10-08 15:50:52,950 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2024-10-08 15:51:04,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=348480.0, ans=0.015 2024-10-08 15:51:41,423 INFO [train.py:1136] (0/2) Epoch 36, batch 500, loss[loss=0.169, simple_loss=0.2738, pruned_loss=0.03209, over 87312.00 frames. ], tot_loss[loss=0.1754, simple_loss=0.2762, pruned_loss=0.0373, over 15789371.41 frames. ], batch size: 517, lr: 7.10e-03, grad_scale: 8.0 2024-10-08 15:52:03,364 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2024-10-08 15:52:11,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=348840.0, ans=0.025 2024-10-08 15:52:12,466 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-10-08 15:52:54,840 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-10-08 15:52:56,641 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=12.0 2024-10-08 15:53:06,268 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.377e+02 3.998e+02 4.446e+02 5.028e+02 1.281e+03, threshold=8.892e+02, percent-clipped=1.0 2024-10-08 15:53:13,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=349320.0, ans=0.125 2024-10-08 15:53:15,037 INFO [train.py:1136] (0/2) Epoch 36, batch 550, loss[loss=0.1821, simple_loss=0.2872, pruned_loss=0.0385, over 84436.00 frames. ], tot_loss[loss=0.1755, simple_loss=0.2761, pruned_loss=0.03747, over 16087558.53 frames. ], batch size: 957, lr: 7.09e-03, grad_scale: 8.0 2024-10-08 15:53:17,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=349320.0, ans=0.0 2024-10-08 15:54:32,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=349800.0, ans=0.125 2024-10-08 15:54:47,845 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=349800.0, ans=0.0 2024-10-08 15:54:50,827 INFO [train.py:1136] (0/2) Epoch 36, batch 600, loss[loss=0.1691, simple_loss=0.2711, pruned_loss=0.0335, over 87329.00 frames. ], tot_loss[loss=0.1759, simple_loss=0.2765, pruned_loss=0.03763, over 16303649.19 frames. ], batch size: 372, lr: 7.08e-03, grad_scale: 8.0 2024-10-08 15:55:10,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=350040.0, ans=0.0 2024-10-08 15:55:16,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2024-10-08 15:55:30,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=350160.0, ans=0.125 2024-10-08 15:55:37,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=350160.0, ans=0.125 2024-10-08 15:55:44,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=350160.0, ans=0.1 2024-10-08 15:55:49,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=350280.0, ans=0.1 2024-10-08 15:56:18,247 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.348e+02 4.219e+02 4.587e+02 5.130e+02 7.423e+02, threshold=9.175e+02, percent-clipped=0.0 2024-10-08 15:56:20,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=350400.0, ans=0.025 2024-10-08 15:56:22,882 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2024-10-08 15:56:26,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=350520.0, ans=0.0 2024-10-08 15:56:27,465 INFO [train.py:1136] (0/2) Epoch 36, batch 650, loss[loss=0.1905, simple_loss=0.2938, pruned_loss=0.04357, over 81900.00 frames. ], tot_loss[loss=0.176, simple_loss=0.2764, pruned_loss=0.03778, over 16469367.85 frames. ], batch size: 1245, lr: 7.08e-03, grad_scale: 8.0 2024-10-08 15:56:44,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=350520.0, ans=0.0 2024-10-08 15:56:50,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=350640.0, ans=0.1 2024-10-08 15:56:57,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=350640.0, ans=0.125 2024-10-08 15:56:59,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=350640.0, ans=0.0 2024-10-08 15:57:18,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=350760.0, ans=0.5 2024-10-08 15:57:53,424 INFO [train.py:1136] (0/2) Epoch 36, batch 700, loss[loss=0.1693, simple_loss=0.276, pruned_loss=0.03133, over 87184.00 frames. ], tot_loss[loss=0.1756, simple_loss=0.2762, pruned_loss=0.03749, over 16626643.30 frames. ], batch size: 517, lr: 7.07e-03, grad_scale: 8.0 2024-10-08 15:58:08,320 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.63 vs. limit=15.0 2024-10-08 15:58:49,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=351480.0, ans=0.2 2024-10-08 15:58:51,147 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=351480.0, ans=0.0 2024-10-08 15:58:59,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=351600.0, ans=0.1 2024-10-08 15:59:05,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=351600.0, ans=0.2 2024-10-08 15:59:05,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=351600.0, ans=0.125 2024-10-08 15:59:09,755 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.586e+02 4.288e+02 4.944e+02 5.673e+02 8.180e+02, threshold=9.888e+02, percent-clipped=0.0 2024-10-08 15:59:18,625 INFO [train.py:1136] (0/2) Epoch 36, batch 750, loss[loss=0.1893, simple_loss=0.2924, pruned_loss=0.04311, over 84463.00 frames. ], tot_loss[loss=0.1764, simple_loss=0.277, pruned_loss=0.03787, over 16686872.75 frames. ], batch size: 957, lr: 7.07e-03, grad_scale: 8.0 2024-10-08 15:59:20,484 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=351720.0, ans=0.09899494936611666 2024-10-08 15:59:31,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=351720.0, ans=0.1 2024-10-08 15:59:50,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=351960.0, ans=0.2 2024-10-08 15:59:55,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=351960.0, ans=0.125 2024-10-08 16:00:05,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=352080.0, ans=0.125 2024-10-08 16:00:12,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=352080.0, ans=0.125 2024-10-08 16:00:32,332 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=352200.0, ans=0.0 2024-10-08 16:00:39,814 INFO [train.py:1136] (0/2) Epoch 36, batch 800, loss[loss=0.1898, simple_loss=0.2924, pruned_loss=0.04363, over 82000.00 frames. ], tot_loss[loss=0.176, simple_loss=0.2763, pruned_loss=0.03786, over 16751662.10 frames. ], batch size: 1245, lr: 7.06e-03, grad_scale: 16.0 2024-10-08 16:00:40,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=352320.0, ans=0.125 2024-10-08 16:00:52,433 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=12.0 2024-10-08 16:01:05,051 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-36.pt 2024-10-08 16:01:46,630 INFO [train.py:1136] (0/2) Epoch 37, batch 0, loss[loss=0.1799, simple_loss=0.2821, pruned_loss=0.03884, over 87095.00 frames. ], tot_loss[loss=0.1799, simple_loss=0.2821, pruned_loss=0.03884, over 87095.00 frames. ], batch size: 583, lr: 6.96e-03, grad_scale: 32.0 2024-10-08 16:01:46,632 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 16:01:57,570 INFO [train.py:1168] (0/2) Epoch 37, validation: loss=0.1666, simple_loss=0.2758, pruned_loss=0.02869, over 1382211.00 frames. 2024-10-08 16:01:57,570 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 16:02:01,501 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=352512.0, ans=0.125 2024-10-08 16:02:01,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=352512.0, ans=0.025 2024-10-08 16:02:04,857 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=352512.0, ans=0.0 2024-10-08 16:02:04,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=352512.0, ans=0.1 2024-10-08 16:02:28,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=352632.0, ans=0.125 2024-10-08 16:02:31,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=352632.0, ans=0.125 2024-10-08 16:02:42,871 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.84 vs. limit=10.0 2024-10-08 16:02:50,321 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.477e+02 4.142e+02 4.785e+02 5.515e+02 8.247e+02, threshold=9.571e+02, percent-clipped=0.0 2024-10-08 16:03:08,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=352872.0, ans=0.0 2024-10-08 16:03:18,824 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=352992.0, ans=0.1 2024-10-08 16:03:22,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=352992.0, ans=0.0 2024-10-08 16:03:30,317 INFO [train.py:1136] (0/2) Epoch 37, batch 50, loss[loss=0.18, simple_loss=0.2702, pruned_loss=0.04492, over 87233.00 frames. ], tot_loss[loss=0.1781, simple_loss=0.2791, pruned_loss=0.03859, over 3853746.69 frames. ], batch size: 280, lr: 6.96e-03, grad_scale: 32.0 2024-10-08 16:03:41,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=353112.0, ans=0.125 2024-10-08 16:03:50,725 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=353232.0, ans=0.0 2024-10-08 16:03:50,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=353232.0, ans=0.0 2024-10-08 16:03:52,390 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=353232.0, ans=0.05 2024-10-08 16:04:05,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=353232.0, ans=0.0 2024-10-08 16:04:23,200 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 16:04:47,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-10-08 16:04:56,705 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=353592.0, ans=0.125 2024-10-08 16:05:05,004 INFO [train.py:1136] (0/2) Epoch 37, batch 100, loss[loss=0.1721, simple_loss=0.2704, pruned_loss=0.03693, over 87236.00 frames. ], tot_loss[loss=0.1769, simple_loss=0.2783, pruned_loss=0.03775, over 6778311.28 frames. ], batch size: 330, lr: 6.95e-03, grad_scale: 32.0 2024-10-08 16:05:58,874 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.521e+02 4.067e+02 4.555e+02 5.241e+02 7.482e+02, threshold=9.110e+02, percent-clipped=0.0 2024-10-08 16:06:08,822 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=15.0 2024-10-08 16:06:18,233 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-10-08 16:06:34,315 INFO [train.py:1136] (0/2) Epoch 37, batch 150, loss[loss=0.1858, simple_loss=0.2845, pruned_loss=0.04352, over 86704.00 frames. ], tot_loss[loss=0.1753, simple_loss=0.276, pruned_loss=0.03736, over 9113506.83 frames. ], batch size: 547, lr: 6.95e-03, grad_scale: 32.0 2024-10-08 16:06:52,135 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2024-10-08 16:08:09,270 INFO [train.py:1136] (0/2) Epoch 37, batch 200, loss[loss=0.1883, simple_loss=0.2881, pruned_loss=0.0442, over 86409.00 frames. ], tot_loss[loss=0.1753, simple_loss=0.2759, pruned_loss=0.03735, over 10896550.02 frames. ], batch size: 667, lr: 6.94e-03, grad_scale: 16.0 2024-10-08 16:09:04,416 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.509e+02 3.878e+02 4.351e+02 5.122e+02 6.947e+02, threshold=8.702e+02, percent-clipped=0.0 2024-10-08 16:09:06,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=355272.0, ans=0.2 2024-10-08 16:09:42,160 INFO [train.py:1136] (0/2) Epoch 37, batch 250, loss[loss=0.1852, simple_loss=0.2869, pruned_loss=0.04176, over 86400.00 frames. ], tot_loss[loss=0.1751, simple_loss=0.2758, pruned_loss=0.03717, over 12287724.68 frames. ], batch size: 667, lr: 6.93e-03, grad_scale: 16.0 2024-10-08 16:09:48,043 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=355512.0, ans=0.125 2024-10-08 16:10:36,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=355872.0, ans=0.025 2024-10-08 16:10:41,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=355872.0, ans=0.2 2024-10-08 16:11:14,954 INFO [train.py:1136] (0/2) Epoch 37, batch 300, loss[loss=0.1758, simple_loss=0.2732, pruned_loss=0.03923, over 87394.00 frames. ], tot_loss[loss=0.1746, simple_loss=0.2753, pruned_loss=0.03701, over 13376121.34 frames. ], batch size: 280, lr: 6.93e-03, grad_scale: 16.0 2024-10-08 16:11:50,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=356352.0, ans=0.125 2024-10-08 16:12:11,331 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.360e+02 3.998e+02 4.449e+02 5.062e+02 8.313e+02, threshold=8.898e+02, percent-clipped=0.0 2024-10-08 16:12:42,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356592.0, ans=0.1 2024-10-08 16:12:48,616 INFO [train.py:1136] (0/2) Epoch 37, batch 350, loss[loss=0.1873, simple_loss=0.2933, pruned_loss=0.04063, over 83333.00 frames. ], tot_loss[loss=0.1755, simple_loss=0.2758, pruned_loss=0.03753, over 14203120.12 frames. ], batch size: 1077, lr: 6.92e-03, grad_scale: 16.0 2024-10-08 16:12:55,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356712.0, ans=0.1 2024-10-08 16:13:01,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=356712.0, ans=0.0 2024-10-08 16:14:04,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=357192.0, ans=0.125 2024-10-08 16:14:11,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=357192.0, ans=0.0 2024-10-08 16:14:18,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=357192.0, ans=0.125 2024-10-08 16:14:23,480 INFO [train.py:1136] (0/2) Epoch 37, batch 400, loss[loss=0.1795, simple_loss=0.2821, pruned_loss=0.03845, over 86437.00 frames. ], tot_loss[loss=0.1749, simple_loss=0.2754, pruned_loss=0.03724, over 14862043.05 frames. ], batch size: 620, lr: 6.92e-03, grad_scale: 32.0 2024-10-08 16:14:25,925 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=357312.0, ans=0.07 2024-10-08 16:14:32,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=357312.0, ans=0.0 2024-10-08 16:14:37,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2024-10-08 16:14:54,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357432.0, ans=0.1 2024-10-08 16:15:02,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357552.0, ans=0.1 2024-10-08 16:15:21,005 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.377e+02 4.263e+02 4.794e+02 5.511e+02 9.579e+02, threshold=9.587e+02, percent-clipped=1.0 2024-10-08 16:15:26,702 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=357672.0, ans=0.2 2024-10-08 16:15:40,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=357792.0, ans=0.125 2024-10-08 16:15:59,956 INFO [train.py:1136] (0/2) Epoch 37, batch 450, loss[loss=0.1677, simple_loss=0.2664, pruned_loss=0.03452, over 86966.00 frames. ], tot_loss[loss=0.1749, simple_loss=0.2756, pruned_loss=0.03715, over 15363010.80 frames. ], batch size: 350, lr: 6.91e-03, grad_scale: 32.0 2024-10-08 16:16:15,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358032.0, ans=0.1 2024-10-08 16:16:24,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=358032.0, ans=0.125 2024-10-08 16:16:38,392 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=358152.0, ans=0.125 2024-10-08 16:17:04,613 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=358272.0, ans=0.125 2024-10-08 16:17:10,367 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=22.5 2024-10-08 16:17:10,402 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.74 vs. limit=22.5 2024-10-08 16:17:31,024 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=358512.0, ans=0.125 2024-10-08 16:17:32,355 INFO [train.py:1136] (0/2) Epoch 37, batch 500, loss[loss=0.1706, simple_loss=0.277, pruned_loss=0.03211, over 87319.00 frames. ], tot_loss[loss=0.1744, simple_loss=0.275, pruned_loss=0.03695, over 15773762.59 frames. ], batch size: 393, lr: 6.91e-03, grad_scale: 16.0 2024-10-08 16:17:39,814 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=358512.0, ans=0.09899494936611666 2024-10-08 16:18:29,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=22.5 2024-10-08 16:18:30,328 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.612e+02 4.056e+02 4.422e+02 5.060e+02 6.860e+02, threshold=8.843e+02, percent-clipped=0.0 2024-10-08 16:18:37,496 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=358872.0, ans=0.125 2024-10-08 16:18:57,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=358992.0, ans=0.125 2024-10-08 16:19:08,092 INFO [train.py:1136] (0/2) Epoch 37, batch 550, loss[loss=0.1865, simple_loss=0.2911, pruned_loss=0.04098, over 83460.00 frames. ], tot_loss[loss=0.1748, simple_loss=0.2754, pruned_loss=0.03704, over 16078721.80 frames. ], batch size: 1077, lr: 6.90e-03, grad_scale: 16.0 2024-10-08 16:19:25,540 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=359232.0, ans=0.0 2024-10-08 16:19:43,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=359352.0, ans=0.05 2024-10-08 16:20:06,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=359472.0, ans=0.125 2024-10-08 16:20:26,473 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=359592.0, ans=0.0 2024-10-08 16:20:44,226 INFO [train.py:1136] (0/2) Epoch 37, batch 600, loss[loss=0.1777, simple_loss=0.2719, pruned_loss=0.04177, over 87123.00 frames. ], tot_loss[loss=0.1752, simple_loss=0.2758, pruned_loss=0.03731, over 16277371.24 frames. ], batch size: 280, lr: 6.90e-03, grad_scale: 16.0 2024-10-08 16:20:47,214 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.80 vs. limit=15.0 2024-10-08 16:21:19,825 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=359952.0, ans=0.0 2024-10-08 16:21:28,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=359952.0, ans=0.125 2024-10-08 16:21:42,354 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.469e+02 3.953e+02 4.282e+02 4.985e+02 8.427e+02, threshold=8.563e+02, percent-clipped=0.0 2024-10-08 16:21:42,780 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=360072.0, ans=0.125 2024-10-08 16:22:06,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=360192.0, ans=0.07 2024-10-08 16:22:17,256 INFO [train.py:1136] (0/2) Epoch 37, batch 650, loss[loss=0.1641, simple_loss=0.2701, pruned_loss=0.02907, over 87368.00 frames. ], tot_loss[loss=0.175, simple_loss=0.2754, pruned_loss=0.03724, over 16471787.88 frames. ], batch size: 439, lr: 6.89e-03, grad_scale: 16.0 2024-10-08 16:22:17,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=360312.0, ans=0.125 2024-10-08 16:22:19,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2024-10-08 16:22:19,821 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=22.5 2024-10-08 16:22:52,798 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=360552.0, ans=0.125 2024-10-08 16:23:03,495 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.93 vs. limit=15.0 2024-10-08 16:23:32,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=360792.0, ans=0.04949747468305833 2024-10-08 16:23:40,121 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 16:23:43,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=360792.0, ans=0.0 2024-10-08 16:23:47,832 INFO [train.py:1136] (0/2) Epoch 37, batch 700, loss[loss=0.1655, simple_loss=0.2627, pruned_loss=0.03413, over 86610.00 frames. ], tot_loss[loss=0.1747, simple_loss=0.2752, pruned_loss=0.03715, over 16635015.85 frames. ], batch size: 229, lr: 6.89e-03, grad_scale: 16.0 2024-10-08 16:23:53,000 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=360912.0, ans=0.025 2024-10-08 16:23:54,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=360912.0, ans=0.0 2024-10-08 16:23:59,768 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=22.5 2024-10-08 16:24:02,591 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=361032.0, ans=0.04949747468305833 2024-10-08 16:24:03,283 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2024-10-08 16:24:13,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=361032.0, ans=0.125 2024-10-08 16:24:28,079 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=361152.0, ans=0.125 2024-10-08 16:24:39,076 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.475e+02 4.016e+02 4.343e+02 4.924e+02 7.952e+02, threshold=8.686e+02, percent-clipped=0.0 2024-10-08 16:24:39,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=361272.0, ans=0.125 2024-10-08 16:24:58,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=361392.0, ans=0.125 2024-10-08 16:25:04,671 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=361392.0, ans=0.0 2024-10-08 16:25:09,776 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=361512.0, ans=0.125 2024-10-08 16:25:11,049 INFO [train.py:1136] (0/2) Epoch 37, batch 750, loss[loss=0.1962, simple_loss=0.2917, pruned_loss=0.05033, over 69419.00 frames. ], tot_loss[loss=0.1755, simple_loss=0.276, pruned_loss=0.03751, over 16702413.74 frames. ], batch size: 1960, lr: 6.88e-03, grad_scale: 16.0 2024-10-08 16:26:07,397 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-10-08 16:26:23,422 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 16:26:33,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=362112.0, ans=0.125 2024-10-08 16:26:34,925 INFO [train.py:1136] (0/2) Epoch 37, batch 800, loss[loss=0.1678, simple_loss=0.2657, pruned_loss=0.03491, over 86665.00 frames. ], tot_loss[loss=0.176, simple_loss=0.2765, pruned_loss=0.03781, over 16754488.35 frames. ], batch size: 246, lr: 6.87e-03, grad_scale: 32.0 2024-10-08 16:26:51,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=362232.0, ans=0.0 2024-10-08 16:26:57,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=362232.0, ans=0.125 2024-10-08 16:27:01,907 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-37.pt 2024-10-08 16:27:39,313 INFO [train.py:1136] (0/2) Epoch 38, batch 0, loss[loss=0.1694, simple_loss=0.2694, pruned_loss=0.03469, over 87314.00 frames. ], tot_loss[loss=0.1694, simple_loss=0.2694, pruned_loss=0.03469, over 87314.00 frames. ], batch size: 313, lr: 6.78e-03, grad_scale: 32.0 2024-10-08 16:27:39,314 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 16:27:50,269 INFO [train.py:1168] (0/2) Epoch 38, validation: loss=0.1678, simple_loss=0.2777, pruned_loss=0.02895, over 1382211.00 frames. 2024-10-08 16:27:50,269 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 16:28:09,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=362424.0, ans=0.125 2024-10-08 16:28:19,138 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.562e+02 4.302e+02 5.000e+02 5.722e+02 8.483e+02, threshold=1.000e+03, percent-clipped=0.0 2024-10-08 16:29:02,574 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.43 vs. limit=12.0 2024-10-08 16:29:09,821 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=22.5 2024-10-08 16:29:24,188 INFO [train.py:1136] (0/2) Epoch 38, batch 50, loss[loss=0.1832, simple_loss=0.2881, pruned_loss=0.03919, over 85706.00 frames. ], tot_loss[loss=0.1743, simple_loss=0.2745, pruned_loss=0.03704, over 3852350.33 frames. ], batch size: 721, lr: 6.78e-03, grad_scale: 32.0 2024-10-08 16:29:44,989 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=363024.0, ans=0.0 2024-10-08 16:29:57,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=363024.0, ans=0.125 2024-10-08 16:30:06,719 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.15 vs. limit=15.0 2024-10-08 16:30:16,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=363264.0, ans=0.125 2024-10-08 16:30:52,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=363384.0, ans=0.125 2024-10-08 16:30:57,332 INFO [train.py:1136] (0/2) Epoch 38, batch 100, loss[loss=0.1994, simple_loss=0.3, pruned_loss=0.04934, over 78638.00 frames. ], tot_loss[loss=0.1738, simple_loss=0.2738, pruned_loss=0.0369, over 6808840.43 frames. ], batch size: 1493, lr: 6.77e-03, grad_scale: 32.0 2024-10-08 16:31:00,482 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.85 vs. limit=10.0 2024-10-08 16:31:08,050 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=363504.0, ans=0.2 2024-10-08 16:31:08,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=363504.0, ans=0.0 2024-10-08 16:31:23,363 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.421e+02 4.156e+02 4.544e+02 5.045e+02 9.088e+02, threshold=9.087e+02, percent-clipped=0.0 2024-10-08 16:31:56,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=363864.0, ans=0.025 2024-10-08 16:32:31,987 INFO [train.py:1136] (0/2) Epoch 38, batch 150, loss[loss=0.1831, simple_loss=0.2808, pruned_loss=0.0427, over 86998.00 frames. ], tot_loss[loss=0.1735, simple_loss=0.2731, pruned_loss=0.03691, over 9054967.56 frames. ], batch size: 548, lr: 6.76e-03, grad_scale: 8.0 2024-10-08 16:33:10,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=364344.0, ans=0.025 2024-10-08 16:33:18,312 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2024-10-08 16:33:23,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=364344.0, ans=0.125 2024-10-08 16:34:11,148 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.93 vs. limit=15.0 2024-10-08 16:34:11,624 INFO [train.py:1136] (0/2) Epoch 38, batch 200, loss[loss=0.1943, simple_loss=0.2901, pruned_loss=0.04925, over 69660.00 frames. ], tot_loss[loss=0.1745, simple_loss=0.2746, pruned_loss=0.03721, over 10774937.62 frames. ], batch size: 1960, lr: 6.76e-03, grad_scale: 8.0 2024-10-08 16:34:41,312 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.572e+02 4.079e+02 4.599e+02 5.241e+02 8.347e+02, threshold=9.197e+02, percent-clipped=0.0 2024-10-08 16:34:53,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=364944.0, ans=0.125 2024-10-08 16:35:42,726 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=15.0 2024-10-08 16:35:43,231 INFO [train.py:1136] (0/2) Epoch 38, batch 250, loss[loss=0.1928, simple_loss=0.2895, pruned_loss=0.04801, over 69489.00 frames. ], tot_loss[loss=0.1746, simple_loss=0.2749, pruned_loss=0.03718, over 12166122.82 frames. ], batch size: 1960, lr: 6.75e-03, grad_scale: 8.0 2024-10-08 16:36:24,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=365544.0, ans=0.0 2024-10-08 16:36:25,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=365544.0, ans=0.125 2024-10-08 16:36:37,777 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=365664.0, ans=0.5 2024-10-08 16:36:48,598 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2024-10-08 16:37:18,547 INFO [train.py:1136] (0/2) Epoch 38, batch 300, loss[loss=0.1801, simple_loss=0.283, pruned_loss=0.03857, over 86347.00 frames. ], tot_loss[loss=0.1742, simple_loss=0.2747, pruned_loss=0.03689, over 13259717.79 frames. ], batch size: 667, lr: 6.75e-03, grad_scale: 8.0 2024-10-08 16:37:47,283 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.401e+02 3.958e+02 4.314e+02 4.999e+02 1.503e+03, threshold=8.628e+02, percent-clipped=1.0 2024-10-08 16:38:22,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=366264.0, ans=0.1 2024-10-08 16:38:47,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=366384.0, ans=0.04949747468305833 2024-10-08 16:38:54,700 INFO [train.py:1136] (0/2) Epoch 38, batch 350, loss[loss=0.1904, simple_loss=0.2895, pruned_loss=0.04568, over 69149.00 frames. ], tot_loss[loss=0.1746, simple_loss=0.2751, pruned_loss=0.03704, over 14104078.13 frames. ], batch size: 1960, lr: 6.74e-03, grad_scale: 8.0 2024-10-08 16:39:07,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=366504.0, ans=0.125 2024-10-08 16:39:17,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366624.0, ans=0.1 2024-10-08 16:40:10,308 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=366984.0, ans=0.0 2024-10-08 16:40:15,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=366984.0, ans=0.125 2024-10-08 16:40:17,264 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=366984.0, ans=0.0 2024-10-08 16:40:20,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=366984.0, ans=0.0 2024-10-08 16:40:26,231 INFO [train.py:1136] (0/2) Epoch 38, batch 400, loss[loss=0.1668, simple_loss=0.2733, pruned_loss=0.03018, over 87359.00 frames. ], tot_loss[loss=0.1747, simple_loss=0.2753, pruned_loss=0.03703, over 14780182.26 frames. ], batch size: 464, lr: 6.74e-03, grad_scale: 16.0 2024-10-08 16:40:31,770 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=367104.0, ans=0.0 2024-10-08 16:40:35,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=367104.0, ans=0.2 2024-10-08 16:40:49,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=367224.0, ans=0.2 2024-10-08 16:40:55,958 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.512e+02 3.952e+02 4.288e+02 4.954e+02 8.265e+02, threshold=8.576e+02, percent-clipped=0.0 2024-10-08 16:41:14,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=367344.0, ans=0.0 2024-10-08 16:41:15,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=367344.0, ans=15.0 2024-10-08 16:41:28,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.53 vs. limit=22.5 2024-10-08 16:41:29,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=367464.0, ans=0.025 2024-10-08 16:41:30,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=367464.0, ans=0.04949747468305833 2024-10-08 16:41:57,890 INFO [train.py:1136] (0/2) Epoch 38, batch 450, loss[loss=0.1645, simple_loss=0.2701, pruned_loss=0.0295, over 87343.00 frames. ], tot_loss[loss=0.1741, simple_loss=0.2749, pruned_loss=0.0367, over 15302408.58 frames. ], batch size: 490, lr: 6.73e-03, grad_scale: 16.0 2024-10-08 16:41:59,736 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=367704.0, ans=0.125 2024-10-08 16:42:01,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=367704.0, ans=0.2 2024-10-08 16:42:28,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=367824.0, ans=0.1 2024-10-08 16:42:37,850 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=22.5 2024-10-08 16:42:39,712 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=12.0 2024-10-08 16:43:16,175 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=368184.0, ans=0.2 2024-10-08 16:43:32,403 INFO [train.py:1136] (0/2) Epoch 38, batch 500, loss[loss=0.1879, simple_loss=0.2851, pruned_loss=0.04534, over 86948.00 frames. ], tot_loss[loss=0.1745, simple_loss=0.2752, pruned_loss=0.03688, over 15733051.29 frames. ], batch size: 583, lr: 6.73e-03, grad_scale: 16.0 2024-10-08 16:43:51,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=368424.0, ans=0.125 2024-10-08 16:43:54,264 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=15.0 2024-10-08 16:44:02,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=368424.0, ans=0.0 2024-10-08 16:44:03,780 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.419e+02 3.986e+02 4.419e+02 4.813e+02 6.823e+02, threshold=8.837e+02, percent-clipped=0.0 2024-10-08 16:44:15,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=368544.0, ans=0.025 2024-10-08 16:44:17,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=368544.0, ans=0.2 2024-10-08 16:45:07,629 INFO [train.py:1136] (0/2) Epoch 38, batch 550, loss[loss=0.167, simple_loss=0.2728, pruned_loss=0.03059, over 87223.00 frames. ], tot_loss[loss=0.1739, simple_loss=0.2748, pruned_loss=0.03654, over 16072141.40 frames. ], batch size: 415, lr: 6.72e-03, grad_scale: 16.0 2024-10-08 16:45:34,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=369024.0, ans=0.125 2024-10-08 16:46:15,291 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.01 vs. limit=15.0 2024-10-08 16:46:23,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=369384.0, ans=0.0 2024-10-08 16:46:34,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=369384.0, ans=0.0 2024-10-08 16:46:37,884 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=369384.0, ans=0.125 2024-10-08 16:46:39,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=369504.0, ans=0.125 2024-10-08 16:46:40,706 INFO [train.py:1136] (0/2) Epoch 38, batch 600, loss[loss=0.1604, simple_loss=0.2594, pruned_loss=0.03075, over 86484.00 frames. ], tot_loss[loss=0.1739, simple_loss=0.2747, pruned_loss=0.0365, over 16320676.86 frames. ], batch size: 229, lr: 6.72e-03, grad_scale: 16.0 2024-10-08 16:47:11,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=369624.0, ans=0.0 2024-10-08 16:47:12,638 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.594e+02 4.086e+02 4.364e+02 5.051e+02 7.147e+02, threshold=8.728e+02, percent-clipped=0.0 2024-10-08 16:47:16,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369744.0, ans=0.1 2024-10-08 16:47:22,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=369744.0, ans=0.025 2024-10-08 16:47:35,601 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2024-10-08 16:48:14,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=369984.0, ans=0.0 2024-10-08 16:48:16,986 INFO [train.py:1136] (0/2) Epoch 38, batch 650, loss[loss=0.1702, simple_loss=0.2662, pruned_loss=0.03715, over 86544.00 frames. ], tot_loss[loss=0.1744, simple_loss=0.2753, pruned_loss=0.03674, over 16478572.06 frames. ], batch size: 229, lr: 6.71e-03, grad_scale: 16.0 2024-10-08 16:48:34,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=370224.0, ans=0.2 2024-10-08 16:48:43,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=370224.0, ans=0.0 2024-10-08 16:49:00,491 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=12.0 2024-10-08 16:49:23,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=370464.0, ans=0.0 2024-10-08 16:49:43,418 INFO [train.py:1136] (0/2) Epoch 38, batch 700, loss[loss=0.187, simple_loss=0.2881, pruned_loss=0.04297, over 86436.00 frames. ], tot_loss[loss=0.1744, simple_loss=0.2754, pruned_loss=0.03675, over 16638429.49 frames. ], batch size: 620, lr: 6.71e-03, grad_scale: 16.0 2024-10-08 16:50:11,492 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=370824.0, ans=0.2 2024-10-08 16:50:12,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.446e+02 4.247e+02 4.549e+02 5.018e+02 7.492e+02, threshold=9.098e+02, percent-clipped=0.0 2024-10-08 16:50:26,804 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.46 vs. limit=15.0 2024-10-08 16:50:58,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=371184.0, ans=0.2 2024-10-08 16:51:04,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=371184.0, ans=0.125 2024-10-08 16:51:06,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-10-08 16:51:07,381 INFO [train.py:1136] (0/2) Epoch 38, batch 750, loss[loss=0.1679, simple_loss=0.2704, pruned_loss=0.03272, over 87440.00 frames. ], tot_loss[loss=0.1748, simple_loss=0.2756, pruned_loss=0.03698, over 16737161.07 frames. ], batch size: 372, lr: 6.70e-03, grad_scale: 16.0 2024-10-08 16:51:07,674 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=371304.0, ans=0.125 2024-10-08 16:51:20,591 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-10-08 16:51:28,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=371424.0, ans=0.125 2024-10-08 16:51:38,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=371424.0, ans=0.125 2024-10-08 16:51:40,845 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-10-08 16:51:47,511 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-10-08 16:51:58,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=371664.0, ans=0.0 2024-10-08 16:52:03,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=371664.0, ans=0.0 2024-10-08 16:52:30,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=371904.0, ans=10.0 2024-10-08 16:52:31,428 INFO [train.py:1136] (0/2) Epoch 38, batch 800, loss[loss=0.169, simple_loss=0.2771, pruned_loss=0.03043, over 87314.00 frames. ], tot_loss[loss=0.1757, simple_loss=0.2766, pruned_loss=0.0374, over 16773059.56 frames. ], batch size: 464, lr: 6.70e-03, grad_scale: 32.0 2024-10-08 16:52:41,408 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.15 vs. limit=10.0 2024-10-08 16:52:58,326 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-38.pt 2024-10-08 16:53:36,464 INFO [train.py:1136] (0/2) Epoch 39, batch 0, loss[loss=0.1682, simple_loss=0.2694, pruned_loss=0.03356, over 87097.00 frames. ], tot_loss[loss=0.1682, simple_loss=0.2694, pruned_loss=0.03356, over 87097.00 frames. ], batch size: 330, lr: 6.61e-03, grad_scale: 32.0 2024-10-08 16:53:36,465 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 16:53:46,245 INFO [zipformer.py:1883] (0/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1395, 2.7263, 3.6980, 3.2690], device='cuda:0') 2024-10-08 16:53:47,373 INFO [train.py:1168] (0/2) Epoch 39, validation: loss=0.1676, simple_loss=0.2773, pruned_loss=0.02899, over 1382211.00 frames. 2024-10-08 16:53:47,374 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 16:53:47,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=372096.0, ans=0.0 2024-10-08 16:53:48,159 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=15.0 2024-10-08 16:53:48,958 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.732e+02 4.366e+02 4.790e+02 5.421e+02 1.432e+03, threshold=9.580e+02, percent-clipped=3.0 2024-10-08 16:54:02,113 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2024-10-08 16:54:08,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=372216.0, ans=0.125 2024-10-08 16:54:10,504 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=372216.0, ans=0.0 2024-10-08 16:54:10,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=372216.0, ans=0.125 2024-10-08 16:54:36,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=372336.0, ans=0.125 2024-10-08 16:54:40,275 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372336.0, ans=0.1 2024-10-08 16:54:46,922 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=372456.0, ans=0.0 2024-10-08 16:54:57,494 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=372456.0, ans=0.125 2024-10-08 16:55:23,097 INFO [train.py:1136] (0/2) Epoch 39, batch 50, loss[loss=0.1686, simple_loss=0.262, pruned_loss=0.03763, over 85767.00 frames. ], tot_loss[loss=0.1726, simple_loss=0.2736, pruned_loss=0.03578, over 3862159.34 frames. ], batch size: 180, lr: 6.60e-03, grad_scale: 32.0 2024-10-08 16:55:59,007 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=372936.0, ans=0.125 2024-10-08 16:55:59,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=372936.0, ans=0.07 2024-10-08 16:56:52,374 INFO [train.py:1136] (0/2) Epoch 39, batch 100, loss[loss=0.1678, simple_loss=0.2697, pruned_loss=0.03298, over 86988.00 frames. ], tot_loss[loss=0.1731, simple_loss=0.2733, pruned_loss=0.03641, over 6833921.97 frames. ], batch size: 350, lr: 6.60e-03, grad_scale: 32.0 2024-10-08 16:56:54,016 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.379e+02 4.026e+02 4.453e+02 4.886e+02 7.072e+02, threshold=8.906e+02, percent-clipped=0.0 2024-10-08 16:56:56,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=373296.0, ans=0.125 2024-10-08 16:57:02,997 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=373296.0, ans=0.015 2024-10-08 16:57:34,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=373536.0, ans=0.2 2024-10-08 16:57:37,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=373536.0, ans=0.125 2024-10-08 16:57:39,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2024-10-08 16:58:23,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=373776.0, ans=0.2 2024-10-08 16:58:26,585 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=373896.0, ans=10.0 2024-10-08 16:58:27,859 INFO [train.py:1136] (0/2) Epoch 39, batch 150, loss[loss=0.1703, simple_loss=0.2634, pruned_loss=0.03862, over 87372.00 frames. ], tot_loss[loss=0.1725, simple_loss=0.2731, pruned_loss=0.03588, over 9156877.29 frames. ], batch size: 280, lr: 6.59e-03, grad_scale: 32.0 2024-10-08 16:59:54,921 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 17:00:00,988 INFO [train.py:1136] (0/2) Epoch 39, batch 200, loss[loss=0.1827, simple_loss=0.2876, pruned_loss=0.03895, over 85379.00 frames. ], tot_loss[loss=0.172, simple_loss=0.2729, pruned_loss=0.03556, over 10958136.75 frames. ], batch size: 786, lr: 6.59e-03, grad_scale: 32.0 2024-10-08 17:00:02,668 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.292e+02 3.895e+02 4.250e+02 4.755e+02 7.508e+02, threshold=8.501e+02, percent-clipped=0.0 2024-10-08 17:00:07,903 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.78 vs. limit=12.0 2024-10-08 17:00:15,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=374496.0, ans=0.125 2024-10-08 17:01:08,649 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374856.0, ans=0.125 2024-10-08 17:01:36,235 INFO [train.py:1136] (0/2) Epoch 39, batch 250, loss[loss=0.1656, simple_loss=0.2634, pruned_loss=0.03393, over 87287.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.2736, pruned_loss=0.03593, over 12330113.09 frames. ], batch size: 313, lr: 6.58e-03, grad_scale: 32.0 2024-10-08 17:01:36,778 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=375096.0, ans=0.2 2024-10-08 17:01:45,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=375096.0, ans=0.125 2024-10-08 17:02:00,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=375216.0, ans=0.2 2024-10-08 17:02:09,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=375216.0, ans=0.2 2024-10-08 17:02:45,895 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=375456.0, ans=10.0 2024-10-08 17:02:52,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=375576.0, ans=0.0 2024-10-08 17:03:09,446 INFO [train.py:1136] (0/2) Epoch 39, batch 300, loss[loss=0.166, simple_loss=0.2735, pruned_loss=0.02922, over 87361.00 frames. ], tot_loss[loss=0.1732, simple_loss=0.2744, pruned_loss=0.03598, over 13398675.74 frames. ], batch size: 490, lr: 6.58e-03, grad_scale: 32.0 2024-10-08 17:03:11,139 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.441e+02 3.950e+02 4.360e+02 4.998e+02 7.455e+02, threshold=8.719e+02, percent-clipped=0.0 2024-10-08 17:03:48,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=375936.0, ans=0.2 2024-10-08 17:03:48,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=375936.0, ans=0.125 2024-10-08 17:03:52,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=375936.0, ans=0.2 2024-10-08 17:04:22,470 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=12.0 2024-10-08 17:04:23,769 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=376176.0, ans=0.125 2024-10-08 17:04:33,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=376176.0, ans=0.2 2024-10-08 17:04:40,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=376176.0, ans=0.0 2024-10-08 17:04:45,468 INFO [train.py:1136] (0/2) Epoch 39, batch 350, loss[loss=0.1827, simple_loss=0.2848, pruned_loss=0.04031, over 86158.00 frames. ], tot_loss[loss=0.1737, simple_loss=0.2746, pruned_loss=0.03637, over 14201659.23 frames. ], batch size: 667, lr: 6.57e-03, grad_scale: 32.0 2024-10-08 17:05:15,269 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376416.0, ans=0.1 2024-10-08 17:05:34,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=376536.0, ans=0.2 2024-10-08 17:05:44,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2024-10-08 17:05:45,593 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=376656.0, ans=0.2 2024-10-08 17:06:03,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376776.0, ans=0.1 2024-10-08 17:06:17,965 INFO [train.py:1136] (0/2) Epoch 39, batch 400, loss[loss=0.1614, simple_loss=0.2673, pruned_loss=0.02777, over 87413.00 frames. ], tot_loss[loss=0.1734, simple_loss=0.2743, pruned_loss=0.03619, over 14882761.73 frames. ], batch size: 464, lr: 6.57e-03, grad_scale: 32.0 2024-10-08 17:06:19,697 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.421e+02 3.969e+02 4.277e+02 4.872e+02 6.939e+02, threshold=8.555e+02, percent-clipped=0.0 2024-10-08 17:06:52,436 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=377016.0, ans=0.125 2024-10-08 17:07:06,811 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.50 vs. limit=6.0 2024-10-08 17:07:19,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=377256.0, ans=10.0 2024-10-08 17:07:29,508 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=377256.0, ans=0.2 2024-10-08 17:07:48,001 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.31 vs. limit=15.0 2024-10-08 17:07:53,832 INFO [train.py:1136] (0/2) Epoch 39, batch 450, loss[loss=0.1695, simple_loss=0.2717, pruned_loss=0.03367, over 87036.00 frames. ], tot_loss[loss=0.1741, simple_loss=0.2749, pruned_loss=0.03667, over 15292540.56 frames. ], batch size: 350, lr: 6.56e-03, grad_scale: 32.0 2024-10-08 17:08:21,289 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=377616.0, ans=0.125 2024-10-08 17:08:39,616 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=22.5 2024-10-08 17:08:42,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=377736.0, ans=0.2 2024-10-08 17:08:42,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=377736.0, ans=0.125 2024-10-08 17:08:46,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=377736.0, ans=0.07 2024-10-08 17:09:26,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=377976.0, ans=0.125 2024-10-08 17:09:29,958 INFO [train.py:1136] (0/2) Epoch 39, batch 500, loss[loss=0.1806, simple_loss=0.277, pruned_loss=0.04212, over 87151.00 frames. ], tot_loss[loss=0.1743, simple_loss=0.275, pruned_loss=0.03679, over 15673008.11 frames. ], batch size: 296, lr: 6.56e-03, grad_scale: 32.0 2024-10-08 17:09:31,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.519e+02 3.943e+02 4.487e+02 5.220e+02 7.107e+02, threshold=8.974e+02, percent-clipped=0.0 2024-10-08 17:09:41,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=378096.0, ans=0.07 2024-10-08 17:10:12,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=378336.0, ans=0.04949747468305833 2024-10-08 17:10:58,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=378576.0, ans=0.125 2024-10-08 17:11:03,309 INFO [train.py:1136] (0/2) Epoch 39, batch 550, loss[loss=0.162, simple_loss=0.2703, pruned_loss=0.02684, over 87226.00 frames. ], tot_loss[loss=0.1743, simple_loss=0.2753, pruned_loss=0.0367, over 16000946.38 frames. ], batch size: 517, lr: 6.55e-03, grad_scale: 32.0 2024-10-08 17:12:06,877 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2024-10-08 17:12:13,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=379056.0, ans=0.125 2024-10-08 17:12:31,048 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-10-08 17:12:38,573 INFO [train.py:1136] (0/2) Epoch 39, batch 600, loss[loss=0.1617, simple_loss=0.2583, pruned_loss=0.03261, over 86672.00 frames. ], tot_loss[loss=0.1743, simple_loss=0.2752, pruned_loss=0.03667, over 16248873.41 frames. ], batch size: 213, lr: 6.55e-03, grad_scale: 32.0 2024-10-08 17:12:40,353 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.504e+02 3.953e+02 4.249e+02 4.740e+02 7.515e+02, threshold=8.497e+02, percent-clipped=0.0 2024-10-08 17:13:20,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=379536.0, ans=0.125 2024-10-08 17:13:29,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=379536.0, ans=0.0 2024-10-08 17:13:43,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=379656.0, ans=0.0 2024-10-08 17:14:00,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=379776.0, ans=0.0 2024-10-08 17:14:07,510 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=379776.0, ans=0.125 2024-10-08 17:14:12,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=379896.0, ans=0.2 2024-10-08 17:14:14,262 INFO [train.py:1136] (0/2) Epoch 39, batch 650, loss[loss=0.1614, simple_loss=0.2598, pruned_loss=0.03155, over 86607.00 frames. ], tot_loss[loss=0.1743, simple_loss=0.2752, pruned_loss=0.03671, over 16459381.00 frames. ], batch size: 229, lr: 6.54e-03, grad_scale: 32.0 2024-10-08 17:14:21,373 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=379896.0, ans=0.0 2024-10-08 17:14:23,705 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.22 vs. limit=10.0 2024-10-08 17:14:38,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=380016.0, ans=0.125 2024-10-08 17:14:43,095 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=380016.0, ans=0.125 2024-10-08 17:14:50,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=380136.0, ans=0.2 2024-10-08 17:14:54,006 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=380136.0, ans=0.0 2024-10-08 17:15:19,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=380256.0, ans=0.0 2024-10-08 17:15:23,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=380256.0, ans=0.125 2024-10-08 17:15:29,432 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=380376.0, ans=0.2 2024-10-08 17:15:41,992 INFO [train.py:1136] (0/2) Epoch 39, batch 700, loss[loss=0.188, simple_loss=0.2915, pruned_loss=0.04223, over 84377.00 frames. ], tot_loss[loss=0.1742, simple_loss=0.275, pruned_loss=0.0367, over 16627827.41 frames. ], batch size: 957, lr: 6.54e-03, grad_scale: 32.0 2024-10-08 17:15:43,518 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.429e+02 3.852e+02 4.118e+02 4.782e+02 7.978e+02, threshold=8.236e+02, percent-clipped=0.0 2024-10-08 17:15:47,321 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=380496.0, ans=0.0 2024-10-08 17:16:03,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=380616.0, ans=0.0 2024-10-08 17:16:10,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380616.0, ans=0.1 2024-10-08 17:16:47,389 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=380856.0, ans=0.1 2024-10-08 17:16:56,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=380976.0, ans=0.125 2024-10-08 17:17:07,594 INFO [train.py:1136] (0/2) Epoch 39, batch 750, loss[loss=0.1833, simple_loss=0.2893, pruned_loss=0.03863, over 83242.00 frames. ], tot_loss[loss=0.1751, simple_loss=0.2759, pruned_loss=0.03717, over 16681190.96 frames. ], batch size: 1077, lr: 6.53e-03, grad_scale: 32.0 2024-10-08 17:17:20,334 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=381096.0, ans=0.125 2024-10-08 17:17:25,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=381216.0, ans=0.125 2024-10-08 17:17:48,081 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=15.0 2024-10-08 17:17:52,358 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.81 vs. limit=15.0 2024-10-08 17:17:54,691 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=381336.0, ans=0.125 2024-10-08 17:18:31,231 INFO [train.py:1136] (0/2) Epoch 39, batch 800, loss[loss=0.1717, simple_loss=0.272, pruned_loss=0.03572, over 87145.00 frames. ], tot_loss[loss=0.1755, simple_loss=0.2761, pruned_loss=0.03745, over 16735410.83 frames. ], batch size: 330, lr: 6.53e-03, grad_scale: 32.0 2024-10-08 17:18:31,530 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=381696.0, ans=0.125 2024-10-08 17:18:32,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.471e+02 4.227e+02 4.666e+02 5.407e+02 8.558e+02, threshold=9.332e+02, percent-clipped=1.0 2024-10-08 17:18:33,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=381696.0, ans=0.09899494936611666 2024-10-08 17:18:58,236 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-39.pt 2024-10-08 17:19:34,069 INFO [train.py:1136] (0/2) Epoch 40, batch 0, loss[loss=0.1827, simple_loss=0.2853, pruned_loss=0.04005, over 85411.00 frames. ], tot_loss[loss=0.1827, simple_loss=0.2853, pruned_loss=0.04005, over 85411.00 frames. ], batch size: 787, lr: 6.44e-03, grad_scale: 32.0 2024-10-08 17:19:34,070 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 17:19:45,173 INFO [train.py:1168] (0/2) Epoch 40, validation: loss=0.1666, simple_loss=0.2763, pruned_loss=0.02842, over 1382211.00 frames. 2024-10-08 17:19:45,174 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 17:20:19,468 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=382008.0, ans=0.0 2024-10-08 17:20:27,058 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.74 vs. limit=15.0 2024-10-08 17:20:30,077 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2024-10-08 17:20:30,350 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.45 vs. limit=15.0 2024-10-08 17:20:56,202 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-10-08 17:20:58,903 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=382368.0, ans=0.125 2024-10-08 17:21:18,812 INFO [train.py:1136] (0/2) Epoch 40, batch 50, loss[loss=0.1794, simple_loss=0.28, pruned_loss=0.03942, over 87114.00 frames. ], tot_loss[loss=0.172, simple_loss=0.2733, pruned_loss=0.03532, over 3883316.89 frames. ], batch size: 548, lr: 6.44e-03, grad_scale: 16.0 2024-10-08 17:21:19,897 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.81 vs. limit=6.0 2024-10-08 17:21:22,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=382488.0, ans=0.125 2024-10-08 17:21:24,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=382488.0, ans=0.1 2024-10-08 17:21:34,964 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 17:22:30,276 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.500e+02 4.174e+02 4.771e+02 5.671e+02 7.768e+02, threshold=9.543e+02, percent-clipped=0.0 2024-10-08 17:22:55,234 INFO [train.py:1136] (0/2) Epoch 40, batch 100, loss[loss=0.1608, simple_loss=0.2586, pruned_loss=0.03146, over 86700.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.2742, pruned_loss=0.03583, over 6790843.46 frames. ], batch size: 213, lr: 6.43e-03, grad_scale: 8.0 2024-10-08 17:23:00,959 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=383088.0, ans=0.05 2024-10-08 17:23:21,861 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.76 vs. limit=15.0 2024-10-08 17:23:33,280 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=383328.0, ans=0.0 2024-10-08 17:23:37,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2024-10-08 17:24:01,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=383448.0, ans=0.1 2024-10-08 17:24:13,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383568.0, ans=0.125 2024-10-08 17:24:14,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=383568.0, ans=0.0 2024-10-08 17:24:17,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=383568.0, ans=0.125 2024-10-08 17:24:27,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=383688.0, ans=0.2 2024-10-08 17:24:29,030 INFO [train.py:1136] (0/2) Epoch 40, batch 150, loss[loss=0.2024, simple_loss=0.3012, pruned_loss=0.05175, over 78575.00 frames. ], tot_loss[loss=0.1732, simple_loss=0.2742, pruned_loss=0.03614, over 9044626.12 frames. ], batch size: 1493, lr: 6.43e-03, grad_scale: 8.0 2024-10-08 17:24:37,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=383688.0, ans=0.2 2024-10-08 17:25:17,905 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-32000.pt 2024-10-08 17:25:35,654 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=384048.0, ans=0.1 2024-10-08 17:25:45,395 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.585e+02 3.981e+02 4.485e+02 5.222e+02 1.573e+03, threshold=8.969e+02, percent-clipped=1.0 2024-10-08 17:26:09,062 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=384288.0, ans=0.125 2024-10-08 17:26:10,282 INFO [train.py:1136] (0/2) Epoch 40, batch 200, loss[loss=0.2001, simple_loss=0.2987, pruned_loss=0.05076, over 78820.00 frames. ], tot_loss[loss=0.1738, simple_loss=0.2749, pruned_loss=0.03633, over 10811555.15 frames. ], batch size: 1493, lr: 6.42e-03, grad_scale: 8.0 2024-10-08 17:26:26,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=384288.0, ans=6.0 2024-10-08 17:26:31,097 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=384408.0, ans=0.0 2024-10-08 17:26:39,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=384408.0, ans=0.125 2024-10-08 17:26:44,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384408.0, ans=0.1 2024-10-08 17:27:20,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=384648.0, ans=0.0 2024-10-08 17:27:20,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=384648.0, ans=0.125 2024-10-08 17:27:33,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=384768.0, ans=0.125 2024-10-08 17:27:45,751 INFO [train.py:1136] (0/2) Epoch 40, batch 250, loss[loss=0.1666, simple_loss=0.2702, pruned_loss=0.03147, over 87336.00 frames. ], tot_loss[loss=0.1737, simple_loss=0.2747, pruned_loss=0.03634, over 12217824.08 frames. ], batch size: 439, lr: 6.42e-03, grad_scale: 8.0 2024-10-08 17:27:56,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=384888.0, ans=0.125 2024-10-08 17:28:16,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=22.5 2024-10-08 17:28:30,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=385128.0, ans=0.2 2024-10-08 17:28:31,196 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=2.94 vs. limit=15.0 2024-10-08 17:28:57,201 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.523e+02 4.024e+02 4.329e+02 4.933e+02 7.294e+02, threshold=8.658e+02, percent-clipped=0.0 2024-10-08 17:28:57,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=385248.0, ans=0.035 2024-10-08 17:29:01,091 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 17:29:12,832 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=385368.0, ans=0.2 2024-10-08 17:29:14,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=385368.0, ans=0.0 2024-10-08 17:29:19,247 INFO [train.py:1136] (0/2) Epoch 40, batch 300, loss[loss=0.1715, simple_loss=0.2707, pruned_loss=0.03621, over 87018.00 frames. ], tot_loss[loss=0.1733, simple_loss=0.2742, pruned_loss=0.03621, over 13331842.06 frames. ], batch size: 330, lr: 6.41e-03, grad_scale: 8.0 2024-10-08 17:30:17,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.31 vs. limit=22.5 2024-10-08 17:30:36,034 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.17 vs. limit=10.0 2024-10-08 17:30:54,519 INFO [train.py:1136] (0/2) Epoch 40, batch 350, loss[loss=0.184, simple_loss=0.2889, pruned_loss=0.03952, over 83463.00 frames. ], tot_loss[loss=0.1736, simple_loss=0.2742, pruned_loss=0.03649, over 14173993.62 frames. ], batch size: 1077, lr: 6.41e-03, grad_scale: 8.0 2024-10-08 17:31:10,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=386208.0, ans=0.125 2024-10-08 17:31:20,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=386208.0, ans=0.125 2024-10-08 17:31:40,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386328.0, ans=0.1 2024-10-08 17:32:02,203 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-10-08 17:32:02,867 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.376e+02 4.018e+02 4.338e+02 5.055e+02 6.387e+02, threshold=8.675e+02, percent-clipped=0.0 2024-10-08 17:32:24,741 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=386568.0, ans=0.125 2024-10-08 17:32:27,752 INFO [train.py:1136] (0/2) Epoch 40, batch 400, loss[loss=0.1853, simple_loss=0.2827, pruned_loss=0.04397, over 86950.00 frames. ], tot_loss[loss=0.1731, simple_loss=0.2738, pruned_loss=0.03618, over 14822648.59 frames. ], batch size: 547, lr: 6.41e-03, grad_scale: 16.0 2024-10-08 17:32:38,422 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-10-08 17:32:39,478 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=386688.0, ans=0.125 2024-10-08 17:33:39,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=387048.0, ans=0.125 2024-10-08 17:34:01,797 INFO [train.py:1136] (0/2) Epoch 40, batch 450, loss[loss=0.1813, simple_loss=0.287, pruned_loss=0.03782, over 85854.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.2738, pruned_loss=0.03594, over 15335136.49 frames. ], batch size: 721, lr: 6.40e-03, grad_scale: 16.0 2024-10-08 17:34:14,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=387288.0, ans=0.05 2024-10-08 17:34:23,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=387408.0, ans=0.5 2024-10-08 17:34:48,610 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.46 vs. limit=10.0 2024-10-08 17:34:53,464 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=387528.0, ans=0.125 2024-10-08 17:34:56,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=387648.0, ans=0.015 2024-10-08 17:35:00,951 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=387648.0, ans=0.025 2024-10-08 17:35:06,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=387648.0, ans=0.2 2024-10-08 17:35:13,027 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.400e+02 4.044e+02 4.605e+02 5.224e+02 7.169e+02, threshold=9.210e+02, percent-clipped=0.0 2024-10-08 17:35:22,089 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=387768.0, ans=0.0 2024-10-08 17:35:34,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=387888.0, ans=0.125 2024-10-08 17:35:35,335 INFO [train.py:1136] (0/2) Epoch 40, batch 500, loss[loss=0.1814, simple_loss=0.2853, pruned_loss=0.03873, over 86306.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2737, pruned_loss=0.03583, over 15739854.96 frames. ], batch size: 667, lr: 6.40e-03, grad_scale: 16.0 2024-10-08 17:35:35,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=387888.0, ans=0.025 2024-10-08 17:35:37,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=387888.0, ans=0.025 2024-10-08 17:35:51,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=388008.0, ans=0.2 2024-10-08 17:35:53,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=388008.0, ans=0.025 2024-10-08 17:36:06,144 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=388008.0, ans=0.1 2024-10-08 17:36:17,450 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.55 vs. limit=15.0 2024-10-08 17:36:21,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=388128.0, ans=0.09899494936611666 2024-10-08 17:36:35,485 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=388248.0, ans=0.125 2024-10-08 17:37:08,898 INFO [train.py:1136] (0/2) Epoch 40, batch 550, loss[loss=0.1596, simple_loss=0.2562, pruned_loss=0.03154, over 86651.00 frames. ], tot_loss[loss=0.1732, simple_loss=0.2742, pruned_loss=0.03609, over 16018964.44 frames. ], batch size: 213, lr: 6.39e-03, grad_scale: 16.0 2024-10-08 17:37:15,701 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=388488.0, ans=0.025 2024-10-08 17:37:36,418 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=388608.0, ans=0.125 2024-10-08 17:37:55,100 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=388728.0, ans=0.125 2024-10-08 17:38:17,314 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.296e+02 4.050e+02 4.528e+02 5.046e+02 7.205e+02, threshold=9.057e+02, percent-clipped=0.0 2024-10-08 17:38:38,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=388968.0, ans=0.125 2024-10-08 17:38:44,209 INFO [train.py:1136] (0/2) Epoch 40, batch 600, loss[loss=0.1959, simple_loss=0.2962, pruned_loss=0.04778, over 78572.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2737, pruned_loss=0.0358, over 16275968.04 frames. ], batch size: 1493, lr: 6.39e-03, grad_scale: 16.0 2024-10-08 17:39:37,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=389328.0, ans=0.2 2024-10-08 17:39:51,366 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.79 vs. limit=15.0 2024-10-08 17:39:57,784 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=389568.0, ans=0.2 2024-10-08 17:40:14,649 INFO [train.py:1136] (0/2) Epoch 40, batch 650, loss[loss=0.1648, simple_loss=0.2719, pruned_loss=0.02888, over 87312.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2737, pruned_loss=0.03583, over 16455403.30 frames. ], batch size: 464, lr: 6.38e-03, grad_scale: 16.0 2024-10-08 17:40:39,051 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=389808.0, ans=0.2 2024-10-08 17:40:39,630 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2024-10-08 17:40:44,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=389808.0, ans=0.2 2024-10-08 17:40:47,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-10-08 17:41:20,115 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.417e+02 3.987e+02 4.353e+02 4.908e+02 7.609e+02, threshold=8.707e+02, percent-clipped=0.0 2024-10-08 17:41:33,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=390168.0, ans=0.0 2024-10-08 17:41:44,815 INFO [train.py:1136] (0/2) Epoch 40, batch 700, loss[loss=0.1776, simple_loss=0.2792, pruned_loss=0.03795, over 86312.00 frames. ], tot_loss[loss=0.1731, simple_loss=0.2743, pruned_loss=0.03592, over 16579543.22 frames. ], batch size: 620, lr: 6.38e-03, grad_scale: 16.0 2024-10-08 17:41:48,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2024-10-08 17:42:38,399 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=390648.0, ans=0.2 2024-10-08 17:42:42,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=390648.0, ans=0.125 2024-10-08 17:42:43,424 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-10-08 17:42:44,459 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 17:42:55,236 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=390768.0, ans=0.04949747468305833 2024-10-08 17:43:07,476 INFO [train.py:1136] (0/2) Epoch 40, batch 750, loss[loss=0.1784, simple_loss=0.2834, pruned_loss=0.03675, over 84621.00 frames. ], tot_loss[loss=0.1734, simple_loss=0.2744, pruned_loss=0.03617, over 16671848.72 frames. ], batch size: 957, lr: 6.37e-03, grad_scale: 16.0 2024-10-08 17:43:11,011 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=390888.0, ans=0.5 2024-10-08 17:43:26,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=391008.0, ans=0.1 2024-10-08 17:43:29,872 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=391008.0, ans=0.125 2024-10-08 17:44:00,210 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=391248.0, ans=0.125 2024-10-08 17:44:07,857 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.552e+02 4.115e+02 4.681e+02 5.332e+02 7.865e+02, threshold=9.361e+02, percent-clipped=0.0 2024-10-08 17:44:16,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=391368.0, ans=0.0 2024-10-08 17:44:30,668 INFO [train.py:1136] (0/2) Epoch 40, batch 800, loss[loss=0.1626, simple_loss=0.2562, pruned_loss=0.03454, over 86434.00 frames. ], tot_loss[loss=0.1733, simple_loss=0.2743, pruned_loss=0.03617, over 16739810.15 frames. ], batch size: 213, lr: 6.37e-03, grad_scale: 32.0 2024-10-08 17:44:32,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=391488.0, ans=0.125 2024-10-08 17:44:56,812 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-40.pt 2024-10-08 17:45:35,558 INFO [train.py:1136] (0/2) Epoch 41, batch 0, loss[loss=0.1676, simple_loss=0.2728, pruned_loss=0.03118, over 87303.00 frames. ], tot_loss[loss=0.1676, simple_loss=0.2728, pruned_loss=0.03118, over 87303.00 frames. ], batch size: 464, lr: 6.29e-03, grad_scale: 32.0 2024-10-08 17:45:35,559 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 17:45:46,529 INFO [train.py:1168] (0/2) Epoch 41, validation: loss=0.1664, simple_loss=0.2761, pruned_loss=0.02838, over 1382211.00 frames. 2024-10-08 17:45:46,530 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 17:45:48,781 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=391680.0, ans=0.0 2024-10-08 17:46:03,430 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=391800.0, ans=0.125 2024-10-08 17:46:32,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391920.0, ans=0.1 2024-10-08 17:46:37,976 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391920.0, ans=0.1 2024-10-08 17:46:43,876 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=392040.0, ans=0.2 2024-10-08 17:47:00,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=392160.0, ans=0.2 2024-10-08 17:47:03,900 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=392160.0, ans=0.125 2024-10-08 17:47:20,041 INFO [train.py:1136] (0/2) Epoch 41, batch 50, loss[loss=0.1796, simple_loss=0.2805, pruned_loss=0.03939, over 86952.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.2742, pruned_loss=0.03572, over 3873095.74 frames. ], batch size: 583, lr: 6.28e-03, grad_scale: 32.0 2024-10-08 17:47:42,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=392400.0, ans=0.125 2024-10-08 17:48:04,649 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.358e+02 3.994e+02 4.688e+02 5.330e+02 7.343e+02, threshold=9.375e+02, percent-clipped=0.0 2024-10-08 17:48:13,497 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=392520.0, ans=0.125 2024-10-08 17:48:25,179 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 17:48:35,329 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.30 vs. limit=12.0 2024-10-08 17:48:52,332 INFO [train.py:1136] (0/2) Epoch 41, batch 100, loss[loss=0.1716, simple_loss=0.2707, pruned_loss=0.03621, over 86997.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.2716, pruned_loss=0.03562, over 6819293.09 frames. ], batch size: 350, lr: 6.28e-03, grad_scale: 16.0 2024-10-08 17:49:10,359 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=393000.0, ans=0.0 2024-10-08 17:49:10,885 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.50 vs. limit=22.5 2024-10-08 17:49:20,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=393000.0, ans=0.125 2024-10-08 17:49:22,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=393000.0, ans=0.125 2024-10-08 17:49:33,793 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.86 vs. limit=15.0 2024-10-08 17:50:24,288 INFO [train.py:1136] (0/2) Epoch 41, batch 150, loss[loss=0.1839, simple_loss=0.2848, pruned_loss=0.04146, over 86972.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.2721, pruned_loss=0.03523, over 9140378.52 frames. ], batch size: 583, lr: 6.27e-03, grad_scale: 16.0 2024-10-08 17:51:01,803 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=15.0 2024-10-08 17:51:07,117 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-10-08 17:51:09,229 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.439e+02 3.936e+02 4.223e+02 4.664e+02 6.980e+02, threshold=8.446e+02, percent-clipped=0.0 2024-10-08 17:51:13,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=393720.0, ans=0.125 2024-10-08 17:51:14,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=393720.0, ans=0.05 2024-10-08 17:51:55,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=393960.0, ans=0.025 2024-10-08 17:52:04,269 INFO [train.py:1136] (0/2) Epoch 41, batch 200, loss[loss=0.18, simple_loss=0.2849, pruned_loss=0.0375, over 83423.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2736, pruned_loss=0.03588, over 10902510.06 frames. ], batch size: 1077, lr: 6.27e-03, grad_scale: 16.0 2024-10-08 17:52:09,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=394080.0, ans=0.125 2024-10-08 17:52:30,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=394200.0, ans=0.2 2024-10-08 17:52:48,925 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2024-10-08 17:53:05,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=394440.0, ans=0.125 2024-10-08 17:53:23,984 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=394560.0, ans=0.0 2024-10-08 17:53:37,826 INFO [train.py:1136] (0/2) Epoch 41, batch 250, loss[loss=0.1644, simple_loss=0.2713, pruned_loss=0.02882, over 87242.00 frames. ], tot_loss[loss=0.173, simple_loss=0.2737, pruned_loss=0.03619, over 12260091.33 frames. ], batch size: 517, lr: 6.26e-03, grad_scale: 16.0 2024-10-08 17:53:58,350 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 17:54:20,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.481e+02 4.072e+02 4.396e+02 4.989e+02 7.141e+02, threshold=8.792e+02, percent-clipped=0.0 2024-10-08 17:54:59,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=395160.0, ans=0.0 2024-10-08 17:55:11,276 INFO [train.py:1136] (0/2) Epoch 41, batch 300, loss[loss=0.1876, simple_loss=0.2876, pruned_loss=0.04377, over 69189.00 frames. ], tot_loss[loss=0.1731, simple_loss=0.274, pruned_loss=0.03611, over 13320468.50 frames. ], batch size: 1960, lr: 6.26e-03, grad_scale: 16.0 2024-10-08 17:55:17,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=395280.0, ans=0.0 2024-10-08 17:55:54,981 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-10-08 17:56:08,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=395640.0, ans=0.125 2024-10-08 17:56:19,824 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.31 vs. limit=15.0 2024-10-08 17:56:24,018 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395640.0, ans=0.1 2024-10-08 17:56:29,631 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=395760.0, ans=0.2 2024-10-08 17:56:46,548 INFO [train.py:1136] (0/2) Epoch 41, batch 350, loss[loss=0.1818, simple_loss=0.2829, pruned_loss=0.04035, over 85940.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2737, pruned_loss=0.03586, over 14201407.19 frames. ], batch size: 721, lr: 6.26e-03, grad_scale: 16.0 2024-10-08 17:57:31,446 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.367e+02 3.954e+02 4.348e+02 4.844e+02 7.599e+02, threshold=8.695e+02, percent-clipped=0.0 2024-10-08 17:57:37,187 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396120.0, ans=0.1 2024-10-08 17:57:48,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=396240.0, ans=0.0 2024-10-08 17:57:50,634 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=396240.0, ans=0.0 2024-10-08 17:58:03,682 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=396360.0, ans=0.125 2024-10-08 17:58:22,741 INFO [train.py:1136] (0/2) Epoch 41, batch 400, loss[loss=0.1647, simple_loss=0.2634, pruned_loss=0.03296, over 86713.00 frames. ], tot_loss[loss=0.1723, simple_loss=0.2732, pruned_loss=0.03569, over 14842590.31 frames. ], batch size: 229, lr: 6.25e-03, grad_scale: 16.0 2024-10-08 17:58:33,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396480.0, ans=0.1 2024-10-08 17:58:35,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=396480.0, ans=0.125 2024-10-08 17:58:49,222 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2024-10-08 17:59:06,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=396720.0, ans=0.1 2024-10-08 17:59:31,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=396840.0, ans=0.07 2024-10-08 17:59:58,332 INFO [train.py:1136] (0/2) Epoch 41, batch 450, loss[loss=0.165, simple_loss=0.2673, pruned_loss=0.03132, over 87424.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2737, pruned_loss=0.03588, over 15362311.74 frames. ], batch size: 393, lr: 6.25e-03, grad_scale: 16.0 2024-10-08 18:00:00,638 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=397080.0, ans=0.125 2024-10-08 18:00:09,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=397080.0, ans=0.125 2024-10-08 18:00:09,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=397080.0, ans=0.0 2024-10-08 18:00:42,232 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.361e+02 3.819e+02 4.226e+02 4.858e+02 6.179e+02, threshold=8.452e+02, percent-clipped=0.0 2024-10-08 18:01:06,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=397440.0, ans=0.0 2024-10-08 18:01:08,360 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397440.0, ans=0.1 2024-10-08 18:01:25,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=397560.0, ans=0.125 2024-10-08 18:01:31,826 INFO [train.py:1136] (0/2) Epoch 41, batch 500, loss[loss=0.1827, simple_loss=0.2849, pruned_loss=0.04022, over 86408.00 frames. ], tot_loss[loss=0.1725, simple_loss=0.2733, pruned_loss=0.0358, over 15769023.32 frames. ], batch size: 620, lr: 6.24e-03, grad_scale: 16.0 2024-10-08 18:01:34,086 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397680.0, ans=0.1 2024-10-08 18:01:54,366 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=397800.0, ans=0.125 2024-10-08 18:02:01,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=397800.0, ans=0.125 2024-10-08 18:02:10,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=397920.0, ans=0.125 2024-10-08 18:02:15,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=397920.0, ans=0.125 2024-10-08 18:02:27,337 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=398040.0, ans=0.125 2024-10-08 18:02:34,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=398040.0, ans=0.0 2024-10-08 18:02:42,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=398040.0, ans=0.125 2024-10-08 18:02:57,987 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.92 vs. limit=15.0 2024-10-08 18:03:02,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=398160.0, ans=0.125 2024-10-08 18:03:09,281 INFO [train.py:1136] (0/2) Epoch 41, batch 550, loss[loss=0.165, simple_loss=0.2682, pruned_loss=0.03092, over 87355.00 frames. ], tot_loss[loss=0.1724, simple_loss=0.2734, pruned_loss=0.03576, over 16068376.01 frames. ], batch size: 415, lr: 6.24e-03, grad_scale: 16.0 2024-10-08 18:03:17,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.01 vs. limit=15.0 2024-10-08 18:03:18,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=398280.0, ans=0.125 2024-10-08 18:03:18,847 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2024-10-08 18:03:54,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=398520.0, ans=0.0 2024-10-08 18:03:56,048 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.335e+02 3.962e+02 4.375e+02 5.008e+02 7.569e+02, threshold=8.751e+02, percent-clipped=0.0 2024-10-08 18:03:58,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398520.0, ans=0.1 2024-10-08 18:04:07,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.59 vs. limit=22.5 2024-10-08 18:04:14,667 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2024-10-08 18:04:46,521 INFO [train.py:1136] (0/2) Epoch 41, batch 600, loss[loss=0.1842, simple_loss=0.2869, pruned_loss=0.0408, over 85949.00 frames. ], tot_loss[loss=0.1724, simple_loss=0.2735, pruned_loss=0.03562, over 16298266.26 frames. ], batch size: 721, lr: 6.23e-03, grad_scale: 16.0 2024-10-08 18:05:12,151 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=399000.0, ans=0.125 2024-10-08 18:05:35,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=399120.0, ans=0.2 2024-10-08 18:05:43,945 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=399240.0, ans=0.125 2024-10-08 18:06:25,458 INFO [train.py:1136] (0/2) Epoch 41, batch 650, loss[loss=0.2048, simple_loss=0.3026, pruned_loss=0.05347, over 78503.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.2741, pruned_loss=0.03586, over 16439303.60 frames. ], batch size: 1493, lr: 6.23e-03, grad_scale: 16.0 2024-10-08 18:06:34,402 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=399480.0, ans=0.125 2024-10-08 18:06:36,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2024-10-08 18:07:10,359 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.477e+02 4.018e+02 4.372e+02 5.107e+02 6.799e+02, threshold=8.744e+02, percent-clipped=0.0 2024-10-08 18:07:42,633 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=399960.0, ans=0.125 2024-10-08 18:07:47,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=399960.0, ans=0.125 2024-10-08 18:07:52,343 INFO [train.py:1136] (0/2) Epoch 41, batch 700, loss[loss=0.1798, simple_loss=0.284, pruned_loss=0.03784, over 85489.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.274, pruned_loss=0.03576, over 16606049.54 frames. ], batch size: 786, lr: 6.22e-03, grad_scale: 16.0 2024-10-08 18:08:00,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=400080.0, ans=0.0 2024-10-08 18:08:13,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=400200.0, ans=0.125 2024-10-08 18:08:18,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=400200.0, ans=0.125 2024-10-08 18:08:21,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=400200.0, ans=0.0 2024-10-08 18:08:59,929 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=400560.0, ans=0.1 2024-10-08 18:09:17,184 INFO [train.py:1136] (0/2) Epoch 41, batch 750, loss[loss=0.1655, simple_loss=0.2595, pruned_loss=0.03574, over 85554.00 frames. ], tot_loss[loss=0.1726, simple_loss=0.274, pruned_loss=0.03561, over 16710570.98 frames. ], batch size: 180, lr: 6.22e-03, grad_scale: 16.0 2024-10-08 18:09:24,624 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2024-10-08 18:09:32,519 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=400800.0, ans=0.04949747468305833 2024-10-08 18:09:38,878 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2024-10-08 18:09:42,073 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2024-10-08 18:09:51,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=400920.0, ans=0.125 2024-10-08 18:09:52,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-10-08 18:09:57,635 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.31 vs. limit=15.0 2024-10-08 18:09:59,517 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.507e+02 3.979e+02 4.489e+02 5.495e+02 1.424e+03, threshold=8.978e+02, percent-clipped=3.0 2024-10-08 18:10:14,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2024-10-08 18:10:27,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=401160.0, ans=0.125 2024-10-08 18:10:40,970 INFO [train.py:1136] (0/2) Epoch 41, batch 800, loss[loss=0.1893, simple_loss=0.2849, pruned_loss=0.04685, over 69517.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2739, pruned_loss=0.03572, over 16692726.78 frames. ], batch size: 1960, lr: 6.22e-03, grad_scale: 16.0 2024-10-08 18:11:06,980 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-41.pt 2024-10-08 18:12:02,233 INFO [train.py:1136] (0/2) Epoch 42, batch 0, loss[loss=0.162, simple_loss=0.2704, pruned_loss=0.02679, over 87175.00 frames. ], tot_loss[loss=0.162, simple_loss=0.2704, pruned_loss=0.02679, over 87175.00 frames. ], batch size: 517, lr: 6.14e-03, grad_scale: 32.0 2024-10-08 18:12:02,235 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 18:12:14,223 INFO [train.py:1168] (0/2) Epoch 42, validation: loss=0.1666, simple_loss=0.2753, pruned_loss=0.02894, over 1382211.00 frames. 2024-10-08 18:12:14,224 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 18:12:17,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=401472.0, ans=0.0 2024-10-08 18:12:24,479 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=401472.0, ans=0.125 2024-10-08 18:12:28,597 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-10-08 18:12:32,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-10-08 18:12:38,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=401592.0, ans=0.0 2024-10-08 18:12:40,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=401592.0, ans=0.0 2024-10-08 18:12:53,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=401712.0, ans=0.125 2024-10-08 18:13:06,632 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401712.0, ans=0.1 2024-10-08 18:13:12,271 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.78 vs. limit=22.5 2024-10-08 18:13:43,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=401952.0, ans=0.125 2024-10-08 18:13:51,107 INFO [train.py:1136] (0/2) Epoch 42, batch 50, loss[loss=0.1647, simple_loss=0.2553, pruned_loss=0.03711, over 85870.00 frames. ], tot_loss[loss=0.1711, simple_loss=0.2719, pruned_loss=0.03514, over 3860825.70 frames. ], batch size: 180, lr: 6.13e-03, grad_scale: 32.0 2024-10-08 18:13:56,689 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=402072.0, ans=0.025 2024-10-08 18:14:05,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=402072.0, ans=0.2 2024-10-08 18:14:06,666 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.327e+02 3.788e+02 4.340e+02 5.001e+02 7.023e+02, threshold=8.679e+02, percent-clipped=0.0 2024-10-08 18:14:06,995 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=402192.0, ans=10.0 2024-10-08 18:15:25,462 INFO [train.py:1136] (0/2) Epoch 42, batch 100, loss[loss=0.1793, simple_loss=0.2834, pruned_loss=0.03762, over 85765.00 frames. ], tot_loss[loss=0.1711, simple_loss=0.272, pruned_loss=0.03508, over 6814976.40 frames. ], batch size: 721, lr: 6.13e-03, grad_scale: 32.0 2024-10-08 18:15:28,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=22.5 2024-10-08 18:15:30,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=402672.0, ans=0.125 2024-10-08 18:16:31,835 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=403032.0, ans=0.1 2024-10-08 18:16:50,224 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=403152.0, ans=0.125 2024-10-08 18:16:53,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=403152.0, ans=0.0 2024-10-08 18:17:06,210 INFO [train.py:1136] (0/2) Epoch 42, batch 150, loss[loss=0.1668, simple_loss=0.2649, pruned_loss=0.03435, over 86803.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.2727, pruned_loss=0.03504, over 9100404.09 frames. ], batch size: 229, lr: 6.12e-03, grad_scale: 32.0 2024-10-08 18:17:13,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=403272.0, ans=10.0 2024-10-08 18:17:24,384 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.444e+02 3.918e+02 4.232e+02 4.842e+02 6.832e+02, threshold=8.465e+02, percent-clipped=0.0 2024-10-08 18:17:26,554 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=403392.0, ans=0.125 2024-10-08 18:17:28,653 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.88 vs. limit=10.0 2024-10-08 18:17:34,967 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-10-08 18:18:41,197 INFO [train.py:1136] (0/2) Epoch 42, batch 200, loss[loss=0.1571, simple_loss=0.2573, pruned_loss=0.02846, over 87076.00 frames. ], tot_loss[loss=0.1709, simple_loss=0.2722, pruned_loss=0.03477, over 10898376.46 frames. ], batch size: 264, lr: 6.12e-03, grad_scale: 32.0 2024-10-08 18:18:44,011 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.39 vs. limit=22.5 2024-10-08 18:18:46,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=403872.0, ans=0.125 2024-10-08 18:18:50,303 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=403872.0, ans=0.125 2024-10-08 18:19:12,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=403992.0, ans=0.025 2024-10-08 18:19:13,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=404112.0, ans=0.2 2024-10-08 18:19:37,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=404232.0, ans=0.05 2024-10-08 18:20:06,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=404352.0, ans=0.2 2024-10-08 18:20:14,609 INFO [train.py:1136] (0/2) Epoch 42, batch 250, loss[loss=0.1678, simple_loss=0.2717, pruned_loss=0.03197, over 87367.00 frames. ], tot_loss[loss=0.1708, simple_loss=0.2719, pruned_loss=0.03479, over 12285065.51 frames. ], batch size: 415, lr: 6.12e-03, grad_scale: 16.0 2024-10-08 18:20:21,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=404472.0, ans=10.0 2024-10-08 18:20:26,924 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=404472.0, ans=0.0 2024-10-08 18:20:26,968 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=404472.0, ans=0.05 2024-10-08 18:20:31,500 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.323e+02 3.939e+02 4.380e+02 4.945e+02 6.969e+02, threshold=8.760e+02, percent-clipped=0.0 2024-10-08 18:20:35,620 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=404592.0, ans=0.0 2024-10-08 18:20:42,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=404592.0, ans=0.1 2024-10-08 18:21:04,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=404712.0, ans=0.125 2024-10-08 18:21:25,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=404952.0, ans=0.0 2024-10-08 18:21:25,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=404952.0, ans=0.125 2024-10-08 18:21:46,195 INFO [train.py:1136] (0/2) Epoch 42, batch 300, loss[loss=0.1633, simple_loss=0.2594, pruned_loss=0.03354, over 86776.00 frames. ], tot_loss[loss=0.1716, simple_loss=0.2729, pruned_loss=0.03519, over 13346885.61 frames. ], batch size: 229, lr: 6.11e-03, grad_scale: 16.0 2024-10-08 18:21:58,157 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=405072.0, ans=0.05 2024-10-08 18:23:11,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-10-08 18:23:19,571 INFO [train.py:1136] (0/2) Epoch 42, batch 350, loss[loss=0.1839, simple_loss=0.2848, pruned_loss=0.04149, over 86416.00 frames. ], tot_loss[loss=0.1719, simple_loss=0.2731, pruned_loss=0.03531, over 14155328.15 frames. ], batch size: 620, lr: 6.11e-03, grad_scale: 16.0 2024-10-08 18:23:41,233 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.466e+02 3.922e+02 4.201e+02 4.667e+02 6.208e+02, threshold=8.403e+02, percent-clipped=0.0 2024-10-08 18:24:43,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=406152.0, ans=0.1 2024-10-08 18:24:43,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=406152.0, ans=0.025 2024-10-08 18:24:55,052 INFO [train.py:1136] (0/2) Epoch 42, batch 400, loss[loss=0.1673, simple_loss=0.2715, pruned_loss=0.03149, over 87334.00 frames. ], tot_loss[loss=0.172, simple_loss=0.273, pruned_loss=0.0355, over 14786505.89 frames. ], batch size: 439, lr: 6.10e-03, grad_scale: 32.0 2024-10-08 18:25:14,978 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-10-08 18:25:41,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406512.0, ans=0.1 2024-10-08 18:25:47,194 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.08 vs. limit=15.0 2024-10-08 18:26:16,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=406752.0, ans=0.07 2024-10-08 18:26:28,480 INFO [train.py:1136] (0/2) Epoch 42, batch 450, loss[loss=0.1617, simple_loss=0.267, pruned_loss=0.02823, over 87381.00 frames. ], tot_loss[loss=0.172, simple_loss=0.2731, pruned_loss=0.03544, over 15298774.41 frames. ], batch size: 439, lr: 6.10e-03, grad_scale: 32.0 2024-10-08 18:26:48,042 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.469e+02 3.968e+02 4.481e+02 5.108e+02 8.190e+02, threshold=8.963e+02, percent-clipped=0.0 2024-10-08 18:26:53,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406992.0, ans=0.1 2024-10-08 18:27:22,124 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=407112.0, ans=0.035 2024-10-08 18:28:05,115 INFO [train.py:1136] (0/2) Epoch 42, batch 500, loss[loss=0.1558, simple_loss=0.2542, pruned_loss=0.02868, over 86526.00 frames. ], tot_loss[loss=0.1727, simple_loss=0.2738, pruned_loss=0.03583, over 15640288.18 frames. ], batch size: 229, lr: 6.09e-03, grad_scale: 32.0 2024-10-08 18:28:10,773 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=407472.0, ans=0.125 2024-10-08 18:28:26,569 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=407592.0, ans=0.0 2024-10-08 18:28:33,277 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=407592.0, ans=0.2 2024-10-08 18:28:58,442 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=407712.0, ans=0.2 2024-10-08 18:29:37,178 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=407952.0, ans=0.125 2024-10-08 18:29:41,689 INFO [train.py:1136] (0/2) Epoch 42, batch 550, loss[loss=0.1901, simple_loss=0.2942, pruned_loss=0.04299, over 81851.00 frames. ], tot_loss[loss=0.1731, simple_loss=0.274, pruned_loss=0.03607, over 15915027.73 frames. ], batch size: 1245, lr: 6.09e-03, grad_scale: 32.0 2024-10-08 18:30:01,898 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.497e+02 3.977e+02 4.356e+02 4.819e+02 8.907e+02, threshold=8.712e+02, percent-clipped=0.0 2024-10-08 18:30:15,004 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 18:30:38,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=408432.0, ans=0.0 2024-10-08 18:30:52,182 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=408432.0, ans=0.125 2024-10-08 18:31:17,875 INFO [train.py:1136] (0/2) Epoch 42, batch 600, loss[loss=0.1626, simple_loss=0.2606, pruned_loss=0.03226, over 86688.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.274, pruned_loss=0.03588, over 16191723.57 frames. ], batch size: 246, lr: 6.09e-03, grad_scale: 32.0 2024-10-08 18:31:28,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=408672.0, ans=0.125 2024-10-08 18:31:30,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=408672.0, ans=0.0 2024-10-08 18:31:34,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=408792.0, ans=0.0 2024-10-08 18:31:54,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=408912.0, ans=0.125 2024-10-08 18:31:57,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=408912.0, ans=0.0 2024-10-08 18:32:45,675 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=409152.0, ans=0.125 2024-10-08 18:32:46,022 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.85 vs. limit=12.0 2024-10-08 18:32:51,926 INFO [train.py:1136] (0/2) Epoch 42, batch 650, loss[loss=0.1829, simple_loss=0.2884, pruned_loss=0.03868, over 85318.00 frames. ], tot_loss[loss=0.173, simple_loss=0.274, pruned_loss=0.03602, over 16415520.90 frames. ], batch size: 866, lr: 6.08e-03, grad_scale: 32.0 2024-10-08 18:32:52,475 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=409272.0, ans=0.025 2024-10-08 18:33:11,350 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.500e+02 3.932e+02 4.345e+02 4.882e+02 6.924e+02, threshold=8.689e+02, percent-clipped=0.0 2024-10-08 18:33:19,303 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.63 vs. limit=22.5 2024-10-08 18:33:21,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=409392.0, ans=10.0 2024-10-08 18:33:59,273 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=409632.0, ans=0.125 2024-10-08 18:34:18,604 INFO [train.py:1136] (0/2) Epoch 42, batch 700, loss[loss=0.1703, simple_loss=0.2665, pruned_loss=0.03705, over 87259.00 frames. ], tot_loss[loss=0.1732, simple_loss=0.2739, pruned_loss=0.03621, over 16535958.74 frames. ], batch size: 280, lr: 6.08e-03, grad_scale: 16.0 2024-10-08 18:34:35,691 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2024-10-08 18:34:46,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=409992.0, ans=0.1 2024-10-08 18:34:58,762 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=410112.0, ans=0.2 2024-10-08 18:35:05,918 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-10-08 18:35:06,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=410112.0, ans=0.0 2024-10-08 18:35:10,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=410232.0, ans=0.95 2024-10-08 18:35:13,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=410232.0, ans=0.125 2024-10-08 18:35:32,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=410352.0, ans=0.0 2024-10-08 18:35:36,229 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=410352.0, ans=0.125 2024-10-08 18:35:43,627 INFO [train.py:1136] (0/2) Epoch 42, batch 750, loss[loss=0.1887, simple_loss=0.2865, pruned_loss=0.04546, over 69454.00 frames. ], tot_loss[loss=0.1726, simple_loss=0.2735, pruned_loss=0.0359, over 16673979.42 frames. ], batch size: 1960, lr: 6.07e-03, grad_scale: 16.0 2024-10-08 18:35:46,507 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.86 vs. limit=15.0 2024-10-08 18:36:02,120 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.56 vs. limit=15.0 2024-10-08 18:36:02,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.431e+02 3.881e+02 4.261e+02 4.773e+02 6.440e+02, threshold=8.522e+02, percent-clipped=0.0 2024-10-08 18:36:09,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=410592.0, ans=0.025 2024-10-08 18:36:34,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=410832.0, ans=0.2 2024-10-08 18:36:39,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=410832.0, ans=0.125 2024-10-08 18:36:40,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=410832.0, ans=0.0 2024-10-08 18:37:06,676 INFO [train.py:1136] (0/2) Epoch 42, batch 800, loss[loss=0.1818, simple_loss=0.2882, pruned_loss=0.03767, over 83392.00 frames. ], tot_loss[loss=0.1728, simple_loss=0.2736, pruned_loss=0.03599, over 16734630.35 frames. ], batch size: 1079, lr: 6.07e-03, grad_scale: 16.0 2024-10-08 18:37:19,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=22.5 2024-10-08 18:37:22,273 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=22.5 2024-10-08 18:37:28,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=411192.0, ans=0.125 2024-10-08 18:37:32,444 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-42.pt 2024-10-08 18:38:07,931 INFO [train.py:1136] (0/2) Epoch 43, batch 0, loss[loss=0.1632, simple_loss=0.2624, pruned_loss=0.03205, over 87110.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2624, pruned_loss=0.03205, over 87110.00 frames. ], batch size: 330, lr: 6.00e-03, grad_scale: 32.0 2024-10-08 18:38:07,933 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 18:38:17,408 INFO [zipformer.py:1883] (0/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2307, 2.2373, 2.5830, 3.2217], device='cuda:0') 2024-10-08 18:38:19,377 INFO [train.py:1168] (0/2) Epoch 43, validation: loss=0.1671, simple_loss=0.2752, pruned_loss=0.0295, over 1382211.00 frames. 2024-10-08 18:38:19,378 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 18:38:21,629 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=411264.0, ans=0.035 2024-10-08 18:38:37,343 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=411384.0, ans=0.125 2024-10-08 18:39:12,973 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=411504.0, ans=0.04949747468305833 2024-10-08 18:39:28,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=411624.0, ans=0.125 2024-10-08 18:39:32,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=411624.0, ans=0.0 2024-10-08 18:39:45,227 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.331e+02 3.969e+02 4.419e+02 4.928e+02 7.282e+02, threshold=8.838e+02, percent-clipped=0.0 2024-10-08 18:39:54,619 INFO [train.py:1136] (0/2) Epoch 43, batch 50, loss[loss=0.1685, simple_loss=0.2648, pruned_loss=0.03614, over 87372.00 frames. ], tot_loss[loss=0.1705, simple_loss=0.2717, pruned_loss=0.03463, over 3877396.89 frames. ], batch size: 280, lr: 5.99e-03, grad_scale: 32.0 2024-10-08 18:40:25,565 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.60 vs. limit=15.0 2024-10-08 18:40:40,621 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=412104.0, ans=0.5 2024-10-08 18:40:41,069 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=412104.0, ans=15.0 2024-10-08 18:40:45,511 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=412104.0, ans=0.0 2024-10-08 18:41:11,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=412344.0, ans=0.07 2024-10-08 18:41:28,064 INFO [train.py:1136] (0/2) Epoch 43, batch 100, loss[loss=0.1652, simple_loss=0.2586, pruned_loss=0.03587, over 86490.00 frames. ], tot_loss[loss=0.1706, simple_loss=0.2714, pruned_loss=0.03487, over 6808506.03 frames. ], batch size: 229, lr: 5.99e-03, grad_scale: 32.0 2024-10-08 18:41:42,990 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.14 vs. limit=15.0 2024-10-08 18:42:00,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=412584.0, ans=0.025 2024-10-08 18:42:17,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412704.0, ans=0.1 2024-10-08 18:42:31,328 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 18:42:38,428 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=412824.0, ans=0.0 2024-10-08 18:42:40,121 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=412944.0, ans=0.125 2024-10-08 18:42:56,367 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.366e+02 3.980e+02 4.302e+02 4.893e+02 6.731e+02, threshold=8.603e+02, percent-clipped=0.0 2024-10-08 18:43:01,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=413064.0, ans=0.125 2024-10-08 18:43:03,032 INFO [train.py:1136] (0/2) Epoch 43, batch 150, loss[loss=0.171, simple_loss=0.2714, pruned_loss=0.03536, over 87435.00 frames. ], tot_loss[loss=0.1707, simple_loss=0.272, pruned_loss=0.03474, over 9132793.72 frames. ], batch size: 372, lr: 5.98e-03, grad_scale: 32.0 2024-10-08 18:43:03,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=413064.0, ans=0.035 2024-10-08 18:43:08,667 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=413064.0, ans=0.125 2024-10-08 18:43:11,996 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=413064.0, ans=0.125 2024-10-08 18:44:19,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413544.0, ans=0.1 2024-10-08 18:44:36,998 INFO [train.py:1136] (0/2) Epoch 43, batch 200, loss[loss=0.1677, simple_loss=0.2672, pruned_loss=0.03407, over 87378.00 frames. ], tot_loss[loss=0.1705, simple_loss=0.272, pruned_loss=0.03451, over 10930673.04 frames. ], batch size: 313, lr: 5.98e-03, grad_scale: 32.0 2024-10-08 18:44:40,779 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=413664.0, ans=0.0 2024-10-08 18:45:07,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413784.0, ans=0.0 2024-10-08 18:45:14,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=413904.0, ans=0.125 2024-10-08 18:45:20,477 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=22.5 2024-10-08 18:45:54,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=414144.0, ans=0.125 2024-10-08 18:46:04,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.525e+02 4.006e+02 4.375e+02 5.199e+02 7.456e+02, threshold=8.750e+02, percent-clipped=0.0 2024-10-08 18:46:11,032 INFO [train.py:1136] (0/2) Epoch 43, batch 250, loss[loss=0.1781, simple_loss=0.2824, pruned_loss=0.03686, over 86343.00 frames. ], tot_loss[loss=0.1706, simple_loss=0.2721, pruned_loss=0.03455, over 12294078.41 frames. ], batch size: 667, lr: 5.98e-03, grad_scale: 32.0 2024-10-08 18:46:24,404 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=414264.0, ans=0.0 2024-10-08 18:46:38,663 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=414384.0, ans=0.2 2024-10-08 18:47:19,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=414624.0, ans=0.1 2024-10-08 18:47:47,077 INFO [train.py:1136] (0/2) Epoch 43, batch 300, loss[loss=0.17, simple_loss=0.2621, pruned_loss=0.03897, over 86121.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.2727, pruned_loss=0.03507, over 13368504.01 frames. ], batch size: 197, lr: 5.97e-03, grad_scale: 32.0 2024-10-08 18:48:00,992 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414864.0, ans=0.1 2024-10-08 18:48:08,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=414984.0, ans=0.125 2024-10-08 18:48:26,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=415104.0, ans=0.0 2024-10-08 18:48:41,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=415224.0, ans=0.125 2024-10-08 18:48:55,597 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=415224.0, ans=0.125 2024-10-08 18:49:14,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.493e+02 3.909e+02 4.440e+02 4.995e+02 7.583e+02, threshold=8.881e+02, percent-clipped=0.0 2024-10-08 18:49:17,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=22.5 2024-10-08 18:49:22,649 INFO [train.py:1136] (0/2) Epoch 43, batch 350, loss[loss=0.1606, simple_loss=0.2596, pruned_loss=0.03083, over 86640.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.2728, pruned_loss=0.035, over 14200205.29 frames. ], batch size: 229, lr: 5.97e-03, grad_scale: 32.0 2024-10-08 18:49:37,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=415464.0, ans=0.125 2024-10-08 18:49:42,106 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=415584.0, ans=0.125 2024-10-08 18:50:24,588 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=415824.0, ans=0.1 2024-10-08 18:50:59,064 INFO [train.py:1136] (0/2) Epoch 43, batch 400, loss[loss=0.165, simple_loss=0.2651, pruned_loss=0.0325, over 87239.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.2728, pruned_loss=0.03503, over 14837113.46 frames. ], batch size: 330, lr: 5.96e-03, grad_scale: 32.0 2024-10-08 18:51:27,829 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=416184.0, ans=0.125 2024-10-08 18:52:26,365 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.540e+02 4.008e+02 4.450e+02 5.267e+02 6.710e+02, threshold=8.901e+02, percent-clipped=0.0 2024-10-08 18:52:28,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=416544.0, ans=0.125 2024-10-08 18:52:31,794 INFO [train.py:1136] (0/2) Epoch 43, batch 450, loss[loss=0.1654, simple_loss=0.2689, pruned_loss=0.03094, over 87355.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.2728, pruned_loss=0.0349, over 15356603.27 frames. ], batch size: 415, lr: 5.96e-03, grad_scale: 32.0 2024-10-08 18:52:46,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=416664.0, ans=0.0 2024-10-08 18:53:00,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=416784.0, ans=0.0 2024-10-08 18:53:22,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=416904.0, ans=0.125 2024-10-08 18:53:55,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=417144.0, ans=0.07 2024-10-08 18:54:03,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417144.0, ans=0.1 2024-10-08 18:54:07,139 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=417264.0, ans=0.0 2024-10-08 18:54:08,386 INFO [train.py:1136] (0/2) Epoch 43, batch 500, loss[loss=0.1804, simple_loss=0.2845, pruned_loss=0.03809, over 85375.00 frames. ], tot_loss[loss=0.171, simple_loss=0.2726, pruned_loss=0.03472, over 15753089.72 frames. ], batch size: 866, lr: 5.95e-03, grad_scale: 32.0 2024-10-08 18:55:00,707 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417504.0, ans=0.1 2024-10-08 18:55:39,808 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.345e+02 4.299e+02 4.821e+02 5.858e+02 8.271e+02, threshold=9.641e+02, percent-clipped=0.0 2024-10-08 18:55:45,174 INFO [train.py:1136] (0/2) Epoch 43, batch 550, loss[loss=0.1687, simple_loss=0.2663, pruned_loss=0.0355, over 87432.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.2726, pruned_loss=0.03499, over 15995245.78 frames. ], batch size: 313, lr: 5.95e-03, grad_scale: 32.0 2024-10-08 18:56:10,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=417984.0, ans=0.2 2024-10-08 18:56:43,322 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=418224.0, ans=0.025 2024-10-08 18:56:45,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418224.0, ans=0.1 2024-10-08 18:56:46,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=418224.0, ans=0.2 2024-10-08 18:57:11,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=418344.0, ans=0.125 2024-10-08 18:57:18,144 INFO [train.py:1136] (0/2) Epoch 43, batch 600, loss[loss=0.1655, simple_loss=0.2699, pruned_loss=0.03055, over 87272.00 frames. ], tot_loss[loss=0.1715, simple_loss=0.273, pruned_loss=0.03494, over 16261019.01 frames. ], batch size: 439, lr: 5.95e-03, grad_scale: 32.0 2024-10-08 18:57:41,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=418584.0, ans=0.025 2024-10-08 18:57:59,685 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418704.0, ans=0.1 2024-10-08 18:58:00,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=22.5 2024-10-08 18:58:13,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=418704.0, ans=0.125 2024-10-08 18:58:15,789 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=15.0 2024-10-08 18:58:24,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=418824.0, ans=0.2 2024-10-08 18:58:27,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=418824.0, ans=0.2 2024-10-08 18:58:34,873 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=22.5 2024-10-08 18:58:45,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.436e+02 4.095e+02 4.509e+02 5.128e+02 8.633e+02, threshold=9.017e+02, percent-clipped=0.0 2024-10-08 18:58:50,497 INFO [train.py:1136] (0/2) Epoch 43, batch 650, loss[loss=0.1676, simple_loss=0.2634, pruned_loss=0.03591, over 87302.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.2729, pruned_loss=0.03497, over 16484392.44 frames. ], batch size: 280, lr: 5.94e-03, grad_scale: 32.0 2024-10-08 18:58:51,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419064.0, ans=0.125 2024-10-08 18:59:30,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=419304.0, ans=0.0 2024-10-08 18:59:40,158 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=419304.0, ans=0.0 2024-10-08 18:59:55,238 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=419424.0, ans=0.1 2024-10-08 19:00:22,441 INFO [train.py:1136] (0/2) Epoch 43, batch 700, loss[loss=0.1903, simple_loss=0.294, pruned_loss=0.04331, over 81981.00 frames. ], tot_loss[loss=0.1716, simple_loss=0.273, pruned_loss=0.0351, over 16615124.52 frames. ], batch size: 1245, lr: 5.94e-03, grad_scale: 32.0 2024-10-08 19:00:41,543 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=419784.0, ans=0.125 2024-10-08 19:00:51,426 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=22.5 2024-10-08 19:01:18,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=420024.0, ans=0.0 2024-10-08 19:01:42,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.371e+02 3.950e+02 4.251e+02 4.846e+02 9.431e+02, threshold=8.501e+02, percent-clipped=1.0 2024-10-08 19:01:44,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=420264.0, ans=0.125 2024-10-08 19:01:45,769 INFO [train.py:1136] (0/2) Epoch 43, batch 750, loss[loss=0.1597, simple_loss=0.2595, pruned_loss=0.02997, over 87440.00 frames. ], tot_loss[loss=0.1716, simple_loss=0.2729, pruned_loss=0.03517, over 16715443.09 frames. ], batch size: 264, lr: 5.93e-03, grad_scale: 16.0 2024-10-08 19:02:08,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=420384.0, ans=0.125 2024-10-08 19:02:23,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=420504.0, ans=0.125 2024-10-08 19:02:35,480 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=420624.0, ans=0.0 2024-10-08 19:02:45,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=420624.0, ans=0.125 2024-10-08 19:02:56,607 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.62 vs. limit=10.0 2024-10-08 19:02:59,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=420744.0, ans=0.07 2024-10-08 19:03:09,779 INFO [train.py:1136] (0/2) Epoch 43, batch 800, loss[loss=0.1646, simple_loss=0.2728, pruned_loss=0.02821, over 87167.00 frames. ], tot_loss[loss=0.1721, simple_loss=0.2732, pruned_loss=0.03543, over 16708712.92 frames. ], batch size: 517, lr: 5.93e-03, grad_scale: 16.0 2024-10-08 19:03:16,799 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=420864.0, ans=0.125 2024-10-08 19:03:35,244 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-43.pt 2024-10-08 19:04:15,922 INFO [train.py:1136] (0/2) Epoch 44, batch 0, loss[loss=0.2048, simple_loss=0.3017, pruned_loss=0.05391, over 78815.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.3017, pruned_loss=0.05391, over 78815.00 frames. ], batch size: 1493, lr: 5.86e-03, grad_scale: 32.0 2024-10-08 19:04:15,923 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 19:04:26,818 INFO [train.py:1168] (0/2) Epoch 44, validation: loss=0.1668, simple_loss=0.2748, pruned_loss=0.02944, over 1382211.00 frames. 2024-10-08 19:04:26,818 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 19:05:18,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=421416.0, ans=0.125 2024-10-08 19:05:26,754 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.361e+02 4.035e+02 4.715e+02 5.708e+02 1.092e+03, threshold=9.430e+02, percent-clipped=3.0 2024-10-08 19:05:41,059 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2024-10-08 19:05:53,535 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-10-08 19:05:54,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=421656.0, ans=0.125 2024-10-08 19:05:55,580 INFO [train.py:1136] (0/2) Epoch 44, batch 50, loss[loss=0.1829, simple_loss=0.2871, pruned_loss=0.03933, over 82017.00 frames. ], tot_loss[loss=0.1732, simple_loss=0.2742, pruned_loss=0.03605, over 3835825.81 frames. ], batch size: 1245, lr: 5.86e-03, grad_scale: 16.0 2024-10-08 19:06:03,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=421656.0, ans=0.035 2024-10-08 19:06:08,453 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=421656.0, ans=0.035 2024-10-08 19:06:36,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=421896.0, ans=0.2 2024-10-08 19:06:41,420 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421896.0, ans=0.1 2024-10-08 19:06:49,681 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=422016.0, ans=0.2 2024-10-08 19:07:02,652 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=422016.0, ans=0.125 2024-10-08 19:07:14,761 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=422136.0, ans=0.125 2024-10-08 19:07:26,763 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=422256.0, ans=0.1 2024-10-08 19:07:27,983 INFO [train.py:1136] (0/2) Epoch 44, batch 100, loss[loss=0.1672, simple_loss=0.2656, pruned_loss=0.03442, over 87062.00 frames. ], tot_loss[loss=0.1689, simple_loss=0.2694, pruned_loss=0.03417, over 6832080.11 frames. ], batch size: 350, lr: 5.85e-03, grad_scale: 16.0 2024-10-08 19:07:47,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=422256.0, ans=0.125 2024-10-08 19:07:58,026 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=422376.0, ans=0.125 2024-10-08 19:07:59,557 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422376.0, ans=0.1 2024-10-08 19:08:08,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=422496.0, ans=0.1 2024-10-08 19:08:34,817 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.404e+02 4.105e+02 4.445e+02 5.041e+02 7.404e+02, threshold=8.890e+02, percent-clipped=0.0 2024-10-08 19:08:50,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=422736.0, ans=0.125 2024-10-08 19:08:53,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=422736.0, ans=0.0 2024-10-08 19:08:57,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=422736.0, ans=0.125 2024-10-08 19:09:05,543 INFO [train.py:1136] (0/2) Epoch 44, batch 150, loss[loss=0.1804, simple_loss=0.2848, pruned_loss=0.03796, over 85288.00 frames. ], tot_loss[loss=0.1699, simple_loss=0.2707, pruned_loss=0.03456, over 9121976.59 frames. ], batch size: 786, lr: 5.85e-03, grad_scale: 16.0 2024-10-08 19:09:22,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=422856.0, ans=0.125 2024-10-08 19:09:46,630 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=423096.0, ans=0.125 2024-10-08 19:09:46,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=423096.0, ans=0.025 2024-10-08 19:09:48,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=423096.0, ans=0.125 2024-10-08 19:10:12,396 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=423216.0, ans=0.125 2024-10-08 19:10:41,227 INFO [train.py:1136] (0/2) Epoch 44, batch 200, loss[loss=0.163, simple_loss=0.2682, pruned_loss=0.02884, over 87321.00 frames. ], tot_loss[loss=0.1699, simple_loss=0.271, pruned_loss=0.03438, over 10908987.17 frames. ], batch size: 464, lr: 5.84e-03, grad_scale: 16.0 2024-10-08 19:11:10,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=423576.0, ans=0.025 2024-10-08 19:11:41,932 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.428e+02 3.960e+02 4.191e+02 4.716e+02 7.337e+02, threshold=8.383e+02, percent-clipped=0.0 2024-10-08 19:12:06,154 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2024-10-08 19:12:10,957 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=423936.0, ans=0.125 2024-10-08 19:12:15,520 INFO [train.py:1136] (0/2) Epoch 44, batch 250, loss[loss=0.1773, simple_loss=0.2777, pruned_loss=0.03849, over 87050.00 frames. ], tot_loss[loss=0.1703, simple_loss=0.2717, pruned_loss=0.03448, over 12311651.07 frames. ], batch size: 583, lr: 5.84e-03, grad_scale: 16.0 2024-10-08 19:12:19,386 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424056.0, ans=0.1 2024-10-08 19:13:10,503 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=424416.0, ans=0.125 2024-10-08 19:13:52,060 INFO [train.py:1136] (0/2) Epoch 44, batch 300, loss[loss=0.162, simple_loss=0.2547, pruned_loss=0.03463, over 86550.00 frames. ], tot_loss[loss=0.1711, simple_loss=0.2727, pruned_loss=0.03479, over 13347897.29 frames. ], batch size: 213, lr: 5.84e-03, grad_scale: 16.0 2024-10-08 19:13:59,815 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=424656.0, ans=0.05 2024-10-08 19:14:34,583 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=424896.0, ans=22.5 2024-10-08 19:14:55,787 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.510e+02 4.111e+02 4.438e+02 5.000e+02 8.376e+02, threshold=8.875e+02, percent-clipped=0.0 2024-10-08 19:15:25,755 INFO [train.py:1136] (0/2) Epoch 44, batch 350, loss[loss=0.1567, simple_loss=0.2511, pruned_loss=0.0311, over 86176.00 frames. ], tot_loss[loss=0.1704, simple_loss=0.2719, pruned_loss=0.03443, over 14204118.95 frames. ], batch size: 197, lr: 5.83e-03, grad_scale: 16.0 2024-10-08 19:15:42,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=425256.0, ans=0.2 2024-10-08 19:15:58,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=425376.0, ans=0.05 2024-10-08 19:16:06,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=425496.0, ans=0.125 2024-10-08 19:16:14,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=425496.0, ans=0.125 2024-10-08 19:16:38,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=425616.0, ans=0.125 2024-10-08 19:16:49,602 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-10-08 19:16:51,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=425736.0, ans=0.04949747468305833 2024-10-08 19:17:02,325 INFO [train.py:1136] (0/2) Epoch 44, batch 400, loss[loss=0.1598, simple_loss=0.2701, pruned_loss=0.02476, over 87432.00 frames. ], tot_loss[loss=0.1701, simple_loss=0.2719, pruned_loss=0.03414, over 14837272.94 frames. ], batch size: 464, lr: 5.83e-03, grad_scale: 32.0 2024-10-08 19:17:12,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=425856.0, ans=0.125 2024-10-08 19:17:48,315 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426096.0, ans=0.1 2024-10-08 19:18:04,923 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.499e+02 3.970e+02 4.423e+02 4.924e+02 8.102e+02, threshold=8.846e+02, percent-clipped=0.0 2024-10-08 19:18:32,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=426456.0, ans=0.0 2024-10-08 19:18:34,244 INFO [train.py:1136] (0/2) Epoch 44, batch 450, loss[loss=0.1784, simple_loss=0.2852, pruned_loss=0.03577, over 83335.00 frames. ], tot_loss[loss=0.1697, simple_loss=0.2714, pruned_loss=0.03406, over 15388231.59 frames. ], batch size: 1077, lr: 5.82e-03, grad_scale: 32.0 2024-10-08 19:18:43,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=426456.0, ans=0.2 2024-10-08 19:18:44,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.61 vs. limit=15.0 2024-10-08 19:19:03,231 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.25 vs. limit=12.0 2024-10-08 19:19:37,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=426816.0, ans=0.0 2024-10-08 19:20:10,768 INFO [train.py:1136] (0/2) Epoch 44, batch 500, loss[loss=0.1615, simple_loss=0.2689, pruned_loss=0.02706, over 87192.00 frames. ], tot_loss[loss=0.1698, simple_loss=0.2712, pruned_loss=0.03417, over 15769326.67 frames. ], batch size: 517, lr: 5.82e-03, grad_scale: 32.0 2024-10-08 19:20:25,732 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=427056.0, ans=0.125 2024-10-08 19:20:44,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=427176.0, ans=0.025 2024-10-08 19:20:46,512 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=427296.0, ans=0.02 2024-10-08 19:20:57,603 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.26 vs. limit=12.0 2024-10-08 19:21:12,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=427416.0, ans=0.125 2024-10-08 19:21:13,982 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.622e+02 4.083e+02 4.517e+02 5.277e+02 7.029e+02, threshold=9.033e+02, percent-clipped=0.0 2024-10-08 19:21:43,656 INFO [train.py:1136] (0/2) Epoch 44, batch 550, loss[loss=0.1692, simple_loss=0.2682, pruned_loss=0.03511, over 87348.00 frames. ], tot_loss[loss=0.1699, simple_loss=0.2714, pruned_loss=0.03423, over 16079754.99 frames. ], batch size: 280, lr: 5.82e-03, grad_scale: 32.0 2024-10-08 19:21:49,377 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=427656.0, ans=0.0 2024-10-08 19:22:05,637 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=427776.0, ans=0.0 2024-10-08 19:22:47,815 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=22.5 2024-10-08 19:23:19,396 INFO [train.py:1136] (0/2) Epoch 44, batch 600, loss[loss=0.1643, simple_loss=0.2634, pruned_loss=0.03262, over 87408.00 frames. ], tot_loss[loss=0.1698, simple_loss=0.271, pruned_loss=0.03433, over 16308634.62 frames. ], batch size: 372, lr: 5.81e-03, grad_scale: 32.0 2024-10-08 19:23:28,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=428256.0, ans=0.0 2024-10-08 19:24:17,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=428616.0, ans=0.125 2024-10-08 19:24:21,948 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.488e+02 3.911e+02 4.232e+02 4.578e+02 6.486e+02, threshold=8.464e+02, percent-clipped=0.0 2024-10-08 19:24:35,480 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.28 vs. limit=15.0 2024-10-08 19:24:38,208 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=428736.0, ans=0.125 2024-10-08 19:24:54,608 INFO [train.py:1136] (0/2) Epoch 44, batch 650, loss[loss=0.1663, simple_loss=0.2706, pruned_loss=0.03102, over 87226.00 frames. ], tot_loss[loss=0.169, simple_loss=0.2703, pruned_loss=0.03388, over 16520200.67 frames. ], batch size: 415, lr: 5.81e-03, grad_scale: 32.0 2024-10-08 19:25:08,695 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=428856.0, ans=0.125 2024-10-08 19:25:40,897 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=429096.0, ans=0.2 2024-10-08 19:25:53,050 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 19:26:19,093 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=429336.0, ans=0.125 2024-10-08 19:26:21,993 INFO [train.py:1136] (0/2) Epoch 44, batch 700, loss[loss=0.1638, simple_loss=0.2683, pruned_loss=0.02962, over 87266.00 frames. ], tot_loss[loss=0.1696, simple_loss=0.2707, pruned_loss=0.03427, over 16659042.09 frames. ], batch size: 439, lr: 5.80e-03, grad_scale: 16.0 2024-10-08 19:26:38,521 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=429576.0, ans=0.0 2024-10-08 19:27:04,104 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 19:27:19,486 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.422e+02 3.891e+02 4.269e+02 4.941e+02 6.774e+02, threshold=8.539e+02, percent-clipped=0.0 2024-10-08 19:27:22,252 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.82 vs. limit=15.0 2024-10-08 19:27:31,109 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=429936.0, ans=0.125 2024-10-08 19:27:45,081 INFO [train.py:1136] (0/2) Epoch 44, batch 750, loss[loss=0.2079, simple_loss=0.3068, pruned_loss=0.05447, over 78322.00 frames. ], tot_loss[loss=0.1698, simple_loss=0.2708, pruned_loss=0.03445, over 16769250.06 frames. ], batch size: 1493, lr: 5.80e-03, grad_scale: 16.0 2024-10-08 19:27:46,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=430056.0, ans=0.0 2024-10-08 19:27:49,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2024-10-08 19:28:19,964 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430296.0, ans=0.1 2024-10-08 19:28:36,069 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.57 vs. limit=6.0 2024-10-08 19:28:39,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.57 vs. limit=22.5 2024-10-08 19:29:03,999 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430536.0, ans=0.1 2024-10-08 19:29:08,493 INFO [train.py:1136] (0/2) Epoch 44, batch 800, loss[loss=0.1871, simple_loss=0.2842, pruned_loss=0.04497, over 69755.00 frames. ], tot_loss[loss=0.171, simple_loss=0.272, pruned_loss=0.03502, over 16801569.61 frames. ], batch size: 1960, lr: 5.80e-03, grad_scale: 32.0 2024-10-08 19:29:17,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=430656.0, ans=0.125 2024-10-08 19:29:22,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=430656.0, ans=0.125 2024-10-08 19:29:27,665 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2024-10-08 19:29:36,515 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-44.pt 2024-10-08 19:30:15,903 INFO [train.py:1136] (0/2) Epoch 45, batch 0, loss[loss=0.164, simple_loss=0.2754, pruned_loss=0.02629, over 87459.00 frames. ], tot_loss[loss=0.164, simple_loss=0.2754, pruned_loss=0.02629, over 87459.00 frames. ], batch size: 490, lr: 5.73e-03, grad_scale: 32.0 2024-10-08 19:30:15,904 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 19:30:26,772 INFO [train.py:1168] (0/2) Epoch 45, validation: loss=0.1676, simple_loss=0.2778, pruned_loss=0.02871, over 1382211.00 frames. 2024-10-08 19:30:26,772 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 19:31:03,232 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.569e+02 4.043e+02 4.965e+02 5.625e+02 1.111e+03, threshold=9.930e+02, percent-clipped=1.0 2024-10-08 19:31:15,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=431088.0, ans=0.0 2024-10-08 19:31:25,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431208.0, ans=0.125 2024-10-08 19:31:27,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=431208.0, ans=0.125 2024-10-08 19:31:43,723 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=431328.0, ans=0.125 2024-10-08 19:32:02,476 INFO [train.py:1136] (0/2) Epoch 45, batch 50, loss[loss=0.1789, simple_loss=0.2799, pruned_loss=0.03901, over 86828.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.2729, pruned_loss=0.03486, over 3844762.25 frames. ], batch size: 547, lr: 5.73e-03, grad_scale: 32.0 2024-10-08 19:32:36,805 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=431568.0, ans=0.125 2024-10-08 19:32:45,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431688.0, ans=0.1 2024-10-08 19:32:52,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=431688.0, ans=0.0 2024-10-08 19:33:01,097 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.64 vs. limit=15.0 2024-10-08 19:33:03,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=431808.0, ans=0.125 2024-10-08 19:33:23,286 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=431928.0, ans=0.125 2024-10-08 19:33:26,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431928.0, ans=0.125 2024-10-08 19:33:28,102 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-36000.pt 2024-10-08 19:33:38,973 INFO [train.py:1136] (0/2) Epoch 45, batch 100, loss[loss=0.1766, simple_loss=0.2778, pruned_loss=0.03769, over 87011.00 frames. ], tot_loss[loss=0.1711, simple_loss=0.2725, pruned_loss=0.03486, over 6789083.87 frames. ], batch size: 583, lr: 5.72e-03, grad_scale: 16.0 2024-10-08 19:33:56,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=432168.0, ans=0.125 2024-10-08 19:34:08,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=432168.0, ans=0.07 2024-10-08 19:34:15,814 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.406e+02 4.004e+02 4.381e+02 5.095e+02 7.545e+02, threshold=8.763e+02, percent-clipped=0.0 2024-10-08 19:34:21,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=432288.0, ans=0.025 2024-10-08 19:34:27,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=432288.0, ans=0.025 2024-10-08 19:34:47,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-10-08 19:34:49,348 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=432408.0, ans=0.0 2024-10-08 19:35:14,616 INFO [train.py:1136] (0/2) Epoch 45, batch 150, loss[loss=0.1654, simple_loss=0.2567, pruned_loss=0.03707, over 85994.00 frames. ], tot_loss[loss=0.1711, simple_loss=0.2722, pruned_loss=0.03502, over 9085688.09 frames. ], batch size: 180, lr: 5.72e-03, grad_scale: 16.0 2024-10-08 19:35:30,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=432768.0, ans=0.125 2024-10-08 19:35:42,802 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-10-08 19:36:02,085 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=432888.0, ans=0.0 2024-10-08 19:36:23,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=433008.0, ans=0.07 2024-10-08 19:36:24,854 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=433008.0, ans=0.125 2024-10-08 19:36:40,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=433128.0, ans=0.0 2024-10-08 19:36:45,127 INFO [train.py:1136] (0/2) Epoch 45, batch 200, loss[loss=0.1654, simple_loss=0.2625, pruned_loss=0.03413, over 87317.00 frames. ], tot_loss[loss=0.1707, simple_loss=0.2721, pruned_loss=0.03465, over 10841466.76 frames. ], batch size: 296, lr: 5.71e-03, grad_scale: 16.0 2024-10-08 19:36:55,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433248.0, ans=0.1 2024-10-08 19:37:17,644 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=433368.0, ans=0.125 2024-10-08 19:37:17,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=433368.0, ans=0.125 2024-10-08 19:37:19,228 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433368.0, ans=0.1 2024-10-08 19:37:25,543 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.228e+02 4.046e+02 4.503e+02 4.934e+02 6.173e+02, threshold=9.005e+02, percent-clipped=0.0 2024-10-08 19:38:01,955 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=433608.0, ans=0.0 2024-10-08 19:38:03,622 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=433728.0, ans=0.0 2024-10-08 19:38:07,199 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=433728.0, ans=0.125 2024-10-08 19:38:19,455 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=433728.0, ans=0.0 2024-10-08 19:38:22,432 INFO [train.py:1136] (0/2) Epoch 45, batch 250, loss[loss=0.1625, simple_loss=0.2614, pruned_loss=0.03173, over 87156.00 frames. ], tot_loss[loss=0.1708, simple_loss=0.2721, pruned_loss=0.0348, over 12219920.83 frames. ], batch size: 313, lr: 5.71e-03, grad_scale: 16.0 2024-10-08 19:38:39,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433848.0, ans=0.1 2024-10-08 19:38:48,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=433968.0, ans=0.2 2024-10-08 19:38:57,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=433968.0, ans=0.125 2024-10-08 19:39:24,918 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=434208.0, ans=0.0 2024-10-08 19:39:46,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=434328.0, ans=0.2 2024-10-08 19:39:58,300 INFO [train.py:1136] (0/2) Epoch 45, batch 300, loss[loss=0.2, simple_loss=0.3012, pruned_loss=0.04933, over 78806.00 frames. ], tot_loss[loss=0.1709, simple_loss=0.2719, pruned_loss=0.03494, over 13301328.71 frames. ], batch size: 1493, lr: 5.71e-03, grad_scale: 16.0 2024-10-08 19:39:59,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.50 vs. limit=10.0 2024-10-08 19:40:28,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434568.0, ans=0.1 2024-10-08 19:40:37,465 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.435e+02 4.003e+02 4.428e+02 5.052e+02 7.645e+02, threshold=8.856e+02, percent-clipped=0.0 2024-10-08 19:41:21,853 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2024-10-08 19:41:23,061 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=434928.0, ans=0.125 2024-10-08 19:41:33,462 INFO [train.py:1136] (0/2) Epoch 45, batch 350, loss[loss=0.1657, simple_loss=0.2617, pruned_loss=0.03483, over 87250.00 frames. ], tot_loss[loss=0.1704, simple_loss=0.2714, pruned_loss=0.0347, over 14170597.69 frames. ], batch size: 296, lr: 5.70e-03, grad_scale: 16.0 2024-10-08 19:41:47,882 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435048.0, ans=0.1 2024-10-08 19:41:53,300 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=435168.0, ans=0.0 2024-10-08 19:42:00,325 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=435168.0, ans=0.125 2024-10-08 19:42:10,152 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=22.5 2024-10-08 19:42:12,881 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=435288.0, ans=0.125 2024-10-08 19:42:13,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.10 vs. limit=22.5 2024-10-08 19:42:29,244 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435408.0, ans=0.1 2024-10-08 19:42:53,164 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=435528.0, ans=0.0 2024-10-08 19:43:07,087 INFO [train.py:1136] (0/2) Epoch 45, batch 400, loss[loss=0.1843, simple_loss=0.286, pruned_loss=0.04129, over 85326.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.2725, pruned_loss=0.03511, over 14797583.76 frames. ], batch size: 866, lr: 5.70e-03, grad_scale: 32.0 2024-10-08 19:43:43,766 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.505e+02 3.887e+02 4.154e+02 4.759e+02 6.652e+02, threshold=8.308e+02, percent-clipped=0.0 2024-10-08 19:44:21,883 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=12.0 2024-10-08 19:44:40,149 INFO [train.py:1136] (0/2) Epoch 45, batch 450, loss[loss=0.1721, simple_loss=0.2719, pruned_loss=0.03616, over 87004.00 frames. ], tot_loss[loss=0.1709, simple_loss=0.2721, pruned_loss=0.03482, over 15320919.94 frames. ], batch size: 583, lr: 5.70e-03, grad_scale: 32.0 2024-10-08 19:44:42,084 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=436248.0, ans=0.125 2024-10-08 19:44:43,755 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=436248.0, ans=0.125 2024-10-08 19:45:09,414 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-10-08 19:45:31,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=436488.0, ans=0.125 2024-10-08 19:45:34,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=436488.0, ans=0.1 2024-10-08 19:46:17,760 INFO [train.py:1136] (0/2) Epoch 45, batch 500, loss[loss=0.1785, simple_loss=0.2852, pruned_loss=0.03585, over 84498.00 frames. ], tot_loss[loss=0.1706, simple_loss=0.2719, pruned_loss=0.03462, over 15716348.45 frames. ], batch size: 958, lr: 5.69e-03, grad_scale: 16.0 2024-10-08 19:46:23,919 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=436848.0, ans=0.04949747468305833 2024-10-08 19:46:30,943 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=436848.0, ans=0.125 2024-10-08 19:46:49,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=436968.0, ans=0.1 2024-10-08 19:46:56,489 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437088.0, ans=0.1 2024-10-08 19:46:59,424 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.455e+02 4.158e+02 4.551e+02 5.042e+02 7.227e+02, threshold=9.102e+02, percent-clipped=0.0 2024-10-08 19:47:16,794 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=437208.0, ans=0.125 2024-10-08 19:47:34,801 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437328.0, ans=0.1 2024-10-08 19:47:48,702 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=12.0 2024-10-08 19:47:49,827 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=437328.0, ans=0.0 2024-10-08 19:47:50,314 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2024-10-08 19:47:54,362 INFO [train.py:1136] (0/2) Epoch 45, batch 550, loss[loss=0.1762, simple_loss=0.2786, pruned_loss=0.03693, over 86199.00 frames. ], tot_loss[loss=0.1706, simple_loss=0.2718, pruned_loss=0.03472, over 16011140.69 frames. ], batch size: 667, lr: 5.69e-03, grad_scale: 16.0 2024-10-08 19:47:55,103 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-10-08 19:47:59,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=437448.0, ans=0.025 2024-10-08 19:48:20,356 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=437568.0, ans=0.125 2024-10-08 19:48:32,516 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=437688.0, ans=0.125 2024-10-08 19:48:34,150 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=437688.0, ans=0.125 2024-10-08 19:48:50,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2024-10-08 19:48:55,081 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=437808.0, ans=0.1 2024-10-08 19:48:58,491 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=437808.0, ans=0.5 2024-10-08 19:49:26,359 INFO [train.py:1136] (0/2) Epoch 45, batch 600, loss[loss=0.1668, simple_loss=0.266, pruned_loss=0.03376, over 87232.00 frames. ], tot_loss[loss=0.17, simple_loss=0.2713, pruned_loss=0.03432, over 16286984.64 frames. ], batch size: 350, lr: 5.68e-03, grad_scale: 16.0 2024-10-08 19:49:39,395 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=438048.0, ans=0.125 2024-10-08 19:49:56,912 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=438168.0, ans=0.125 2024-10-08 19:50:05,018 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.520e+02 3.861e+02 4.116e+02 4.607e+02 6.169e+02, threshold=8.232e+02, percent-clipped=0.0 2024-10-08 19:50:05,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=438288.0, ans=0.0 2024-10-08 19:50:08,888 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=438288.0, ans=0.2 2024-10-08 19:50:16,540 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-10-08 19:50:20,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=438288.0, ans=0.125 2024-10-08 19:50:31,461 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=438408.0, ans=0.1 2024-10-08 19:50:39,728 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=438408.0, ans=0.0 2024-10-08 19:50:46,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=438528.0, ans=0.0 2024-10-08 19:50:52,503 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.81 vs. limit=15.0 2024-10-08 19:50:55,205 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=438528.0, ans=0.125 2024-10-08 19:50:59,972 INFO [train.py:1136] (0/2) Epoch 45, batch 650, loss[loss=0.1547, simple_loss=0.2562, pruned_loss=0.02664, over 87029.00 frames. ], tot_loss[loss=0.1697, simple_loss=0.271, pruned_loss=0.03425, over 16479279.71 frames. ], batch size: 264, lr: 5.68e-03, grad_scale: 16.0 2024-10-08 19:51:43,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438888.0, ans=0.1 2024-10-08 19:52:22,266 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=439128.0, ans=0.0 2024-10-08 19:52:23,864 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=439128.0, ans=0.1 2024-10-08 19:52:23,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=439128.0, ans=0.04949747468305833 2024-10-08 19:52:30,313 INFO [train.py:1136] (0/2) Epoch 45, batch 700, loss[loss=0.1681, simple_loss=0.2743, pruned_loss=0.03094, over 87360.00 frames. ], tot_loss[loss=0.1699, simple_loss=0.2711, pruned_loss=0.03429, over 16638435.26 frames. ], batch size: 490, lr: 5.68e-03, grad_scale: 16.0 2024-10-08 19:52:36,357 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.03 vs. limit=15.0 2024-10-08 19:52:47,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=439368.0, ans=0.1 2024-10-08 19:52:47,638 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=22.5 2024-10-08 19:52:48,248 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-10-08 19:52:55,579 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.76 vs. limit=10.0 2024-10-08 19:53:06,503 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.549e+02 3.978e+02 4.413e+02 5.030e+02 8.183e+02, threshold=8.827e+02, percent-clipped=0.0 2024-10-08 19:53:12,739 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.75 vs. limit=15.0 2024-10-08 19:53:31,029 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=439608.0, ans=0.125 2024-10-08 19:53:35,850 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=439608.0, ans=0.125 2024-10-08 19:53:55,583 INFO [train.py:1136] (0/2) Epoch 45, batch 750, loss[loss=0.1596, simple_loss=0.266, pruned_loss=0.02659, over 87481.00 frames. ], tot_loss[loss=0.1704, simple_loss=0.2719, pruned_loss=0.03445, over 16732075.78 frames. ], batch size: 490, lr: 5.67e-03, grad_scale: 16.0 2024-10-08 19:54:04,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.79 vs. limit=15.0 2024-10-08 19:54:21,533 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=439968.0, ans=0.0 2024-10-08 19:54:29,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.95 vs. limit=12.0 2024-10-08 19:54:42,992 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-10-08 19:54:49,130 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=440208.0, ans=0.125 2024-10-08 19:55:19,400 INFO [train.py:1136] (0/2) Epoch 45, batch 800, loss[loss=0.1573, simple_loss=0.255, pruned_loss=0.02977, over 86713.00 frames. ], tot_loss[loss=0.1706, simple_loss=0.2723, pruned_loss=0.03452, over 16736984.15 frames. ], batch size: 229, lr: 5.67e-03, grad_scale: 32.0 2024-10-08 19:55:29,446 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=440448.0, ans=0.0 2024-10-08 19:55:44,615 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-45.pt 2024-10-08 19:56:24,670 INFO [train.py:1136] (0/2) Epoch 46, batch 0, loss[loss=0.1745, simple_loss=0.2792, pruned_loss=0.03486, over 86472.00 frames. ], tot_loss[loss=0.1745, simple_loss=0.2792, pruned_loss=0.03486, over 86472.00 frames. ], batch size: 667, lr: 5.61e-03, grad_scale: 32.0 2024-10-08 19:56:24,672 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 19:56:36,611 INFO [train.py:1168] (0/2) Epoch 46, validation: loss=0.1662, simple_loss=0.2741, pruned_loss=0.02916, over 1382211.00 frames. 2024-10-08 19:56:36,612 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 19:56:42,469 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=440640.0, ans=0.125 2024-10-08 19:56:45,368 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.633e+02 4.157e+02 4.734e+02 5.557e+02 8.947e+02, threshold=9.467e+02, percent-clipped=1.0 2024-10-08 19:56:49,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=440640.0, ans=0.125 2024-10-08 19:56:52,851 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=440760.0, ans=0.125 2024-10-08 19:56:54,415 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 19:56:59,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=440760.0, ans=0.0 2024-10-08 19:57:16,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440880.0, ans=0.1 2024-10-08 19:57:30,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440880.0, ans=0.1 2024-10-08 19:57:37,126 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=441000.0, ans=0.125 2024-10-08 19:58:11,396 INFO [train.py:1136] (0/2) Epoch 46, batch 50, loss[loss=0.1625, simple_loss=0.2637, pruned_loss=0.03059, over 87032.00 frames. ], tot_loss[loss=0.1716, simple_loss=0.2717, pruned_loss=0.03576, over 3850496.96 frames. ], batch size: 350, lr: 5.60e-03, grad_scale: 32.0 2024-10-08 19:59:21,636 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2024-10-08 19:59:36,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=441720.0, ans=0.125 2024-10-08 19:59:47,067 INFO [train.py:1136] (0/2) Epoch 46, batch 100, loss[loss=0.1608, simple_loss=0.2643, pruned_loss=0.02868, over 87412.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.2722, pruned_loss=0.03519, over 6790238.09 frames. ], batch size: 393, lr: 5.60e-03, grad_scale: 32.0 2024-10-08 19:59:52,442 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-10-08 20:00:00,269 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.313e+02 3.885e+02 4.321e+02 4.915e+02 7.477e+02, threshold=8.643e+02, percent-clipped=0.0 2024-10-08 20:00:13,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=441960.0, ans=0.125 2024-10-08 20:00:24,747 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-10-08 20:00:49,890 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.05 vs. limit=12.0 2024-10-08 20:01:06,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442320.0, ans=0.1 2024-10-08 20:01:23,193 INFO [train.py:1136] (0/2) Epoch 46, batch 150, loss[loss=0.1643, simple_loss=0.271, pruned_loss=0.02878, over 87350.00 frames. ], tot_loss[loss=0.1716, simple_loss=0.2725, pruned_loss=0.03538, over 9054011.29 frames. ], batch size: 439, lr: 5.59e-03, grad_scale: 8.0 2024-10-08 20:01:30,362 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=442440.0, ans=0.125 2024-10-08 20:01:45,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=442560.0, ans=0.0 2024-10-08 20:02:01,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=442680.0, ans=0.125 2024-10-08 20:02:04,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=442680.0, ans=0.125 2024-10-08 20:02:10,059 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=442680.0, ans=0.125 2024-10-08 20:02:32,289 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.75 vs. limit=15.0 2024-10-08 20:02:40,485 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-10-08 20:02:58,538 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-10-08 20:02:59,328 INFO [train.py:1136] (0/2) Epoch 46, batch 200, loss[loss=0.1624, simple_loss=0.258, pruned_loss=0.03339, over 86692.00 frames. ], tot_loss[loss=0.1703, simple_loss=0.2712, pruned_loss=0.03469, over 10851781.79 frames. ], batch size: 213, lr: 5.59e-03, grad_scale: 8.0 2024-10-08 20:02:59,848 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443040.0, ans=0.1 2024-10-08 20:03:11,258 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.430e+02 4.044e+02 4.415e+02 5.023e+02 1.236e+03, threshold=8.829e+02, percent-clipped=3.0 2024-10-08 20:03:13,589 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=443040.0, ans=0.2 2024-10-08 20:03:41,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443280.0, ans=0.1 2024-10-08 20:04:21,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=443520.0, ans=0.2 2024-10-08 20:04:25,355 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=443520.0, ans=0.125 2024-10-08 20:04:31,892 INFO [train.py:1136] (0/2) Epoch 46, batch 250, loss[loss=0.1803, simple_loss=0.2805, pruned_loss=0.04009, over 86245.00 frames. ], tot_loss[loss=0.1702, simple_loss=0.2714, pruned_loss=0.03448, over 12242741.70 frames. ], batch size: 620, lr: 5.59e-03, grad_scale: 8.0 2024-10-08 20:04:32,953 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.47 vs. limit=15.0 2024-10-08 20:04:47,826 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.50 vs. limit=15.0 2024-10-08 20:04:51,969 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=443760.0, ans=0.125 2024-10-08 20:04:53,874 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443760.0, ans=0.1 2024-10-08 20:04:55,537 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443760.0, ans=0.1 2024-10-08 20:05:23,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=443880.0, ans=0.125 2024-10-08 20:05:30,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=444000.0, ans=0.1 2024-10-08 20:05:36,440 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=444000.0, ans=0.125 2024-10-08 20:05:59,066 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=444120.0, ans=0.125 2024-10-08 20:06:00,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=444120.0, ans=0.0 2024-10-08 20:06:05,569 INFO [train.py:1136] (0/2) Epoch 46, batch 300, loss[loss=0.1672, simple_loss=0.2595, pruned_loss=0.0374, over 86187.00 frames. ], tot_loss[loss=0.1698, simple_loss=0.2708, pruned_loss=0.03442, over 13309620.55 frames. ], batch size: 197, lr: 5.58e-03, grad_scale: 8.0 2024-10-08 20:06:10,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=444240.0, ans=0.125 2024-10-08 20:06:18,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=444240.0, ans=0.07 2024-10-08 20:06:21,748 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.380e+02 3.928e+02 4.344e+02 4.860e+02 6.281e+02, threshold=8.689e+02, percent-clipped=0.0 2024-10-08 20:06:34,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=444360.0, ans=0.0 2024-10-08 20:06:55,209 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444480.0, ans=0.1 2024-10-08 20:07:16,859 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=444600.0, ans=0.125 2024-10-08 20:07:30,745 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=444720.0, ans=0.09899494936611666 2024-10-08 20:07:41,141 INFO [train.py:1136] (0/2) Epoch 46, batch 350, loss[loss=0.1593, simple_loss=0.2574, pruned_loss=0.03054, over 87050.00 frames. ], tot_loss[loss=0.1702, simple_loss=0.2713, pruned_loss=0.03458, over 14114142.32 frames. ], batch size: 264, lr: 5.58e-03, grad_scale: 4.0 2024-10-08 20:07:41,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=444840.0, ans=0.125 2024-10-08 20:08:17,002 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=445080.0, ans=0.95 2024-10-08 20:08:23,904 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=445080.0, ans=0.125 2024-10-08 20:08:23,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=445080.0, ans=0.1 2024-10-08 20:08:41,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=445200.0, ans=0.07 2024-10-08 20:09:14,462 INFO [train.py:1136] (0/2) Epoch 46, batch 400, loss[loss=0.1825, simple_loss=0.2863, pruned_loss=0.03937, over 84428.00 frames. ], tot_loss[loss=0.1697, simple_loss=0.2711, pruned_loss=0.0342, over 14794004.04 frames. ], batch size: 957, lr: 5.58e-03, grad_scale: 8.0 2024-10-08 20:09:27,499 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 20:09:30,024 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.482e+02 4.160e+02 4.630e+02 5.132e+02 8.225e+02, threshold=9.260e+02, percent-clipped=0.0 2024-10-08 20:09:42,226 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=445560.0, ans=10.0 2024-10-08 20:09:48,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=445560.0, ans=0.0 2024-10-08 20:09:53,703 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=445680.0, ans=0.02 2024-10-08 20:10:24,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=445800.0, ans=0.025 2024-10-08 20:10:27,204 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=22.5 2024-10-08 20:10:28,666 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.22 vs. limit=12.0 2024-10-08 20:10:44,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=445920.0, ans=0.0 2024-10-08 20:10:45,862 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=445920.0, ans=0.125 2024-10-08 20:10:49,061 INFO [train.py:1136] (0/2) Epoch 46, batch 450, loss[loss=0.1665, simple_loss=0.2649, pruned_loss=0.03403, over 87246.00 frames. ], tot_loss[loss=0.1698, simple_loss=0.2711, pruned_loss=0.03421, over 15322872.79 frames. ], batch size: 313, lr: 5.57e-03, grad_scale: 8.0 2024-10-08 20:11:13,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2024-10-08 20:11:20,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=446160.0, ans=0.125 2024-10-08 20:12:17,370 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.37 vs. limit=10.0 2024-10-08 20:12:25,199 INFO [train.py:1136] (0/2) Epoch 46, batch 500, loss[loss=0.1803, simple_loss=0.2794, pruned_loss=0.04058, over 69352.00 frames. ], tot_loss[loss=0.1699, simple_loss=0.2715, pruned_loss=0.03417, over 15730432.88 frames. ], batch size: 1960, lr: 5.57e-03, grad_scale: 8.0 2024-10-08 20:12:38,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.414e+02 3.810e+02 4.182e+02 4.703e+02 6.716e+02, threshold=8.364e+02, percent-clipped=0.0 2024-10-08 20:12:51,425 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 20:12:55,116 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.68 vs. limit=15.0 2024-10-08 20:12:55,262 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2024-10-08 20:13:57,604 INFO [train.py:1136] (0/2) Epoch 46, batch 550, loss[loss=0.1567, simple_loss=0.2538, pruned_loss=0.0298, over 86682.00 frames. ], tot_loss[loss=0.1695, simple_loss=0.2709, pruned_loss=0.03407, over 16047066.46 frames. ], batch size: 246, lr: 5.57e-03, grad_scale: 8.0 2024-10-08 20:14:01,656 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447240.0, ans=0.1 2024-10-08 20:14:12,586 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=447240.0, ans=0.125 2024-10-08 20:14:21,869 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=447360.0, ans=0.0 2024-10-08 20:14:34,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=447480.0, ans=0.125 2024-10-08 20:14:37,910 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=447480.0, ans=0.0 2024-10-08 20:14:37,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=447480.0, ans=0.0 2024-10-08 20:14:53,298 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2024-10-08 20:14:56,658 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-10-08 20:15:08,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447600.0, ans=0.1 2024-10-08 20:15:13,672 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=447720.0, ans=0.125 2024-10-08 20:15:15,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=447720.0, ans=0.125 2024-10-08 20:15:34,638 INFO [train.py:1136] (0/2) Epoch 46, batch 600, loss[loss=0.1626, simple_loss=0.2567, pruned_loss=0.03429, over 86417.00 frames. ], tot_loss[loss=0.1695, simple_loss=0.271, pruned_loss=0.03403, over 16255766.59 frames. ], batch size: 197, lr: 5.56e-03, grad_scale: 8.0 2024-10-08 20:15:40,365 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=447840.0, ans=0.1 2024-10-08 20:15:48,771 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.297e+02 3.876e+02 4.260e+02 4.800e+02 6.752e+02, threshold=8.520e+02, percent-clipped=0.0 2024-10-08 20:15:57,317 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=447960.0, ans=15.0 2024-10-08 20:16:10,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=448080.0, ans=0.125 2024-10-08 20:16:13,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=448080.0, ans=0.125 2024-10-08 20:16:15,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=448080.0, ans=0.125 2024-10-08 20:16:29,574 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=448200.0, ans=0.125 2024-10-08 20:16:52,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=448320.0, ans=0.125 2024-10-08 20:16:54,324 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=448320.0, ans=0.0 2024-10-08 20:16:57,722 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=448320.0, ans=0.07 2024-10-08 20:17:04,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=448320.0, ans=0.125 2024-10-08 20:17:07,343 INFO [train.py:1136] (0/2) Epoch 46, batch 650, loss[loss=0.1646, simple_loss=0.2588, pruned_loss=0.03522, over 87247.00 frames. ], tot_loss[loss=0.1696, simple_loss=0.2713, pruned_loss=0.03393, over 16461130.62 frames. ], batch size: 280, lr: 5.56e-03, grad_scale: 8.0 2024-10-08 20:17:11,492 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=22.5 2024-10-08 20:17:34,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=448560.0, ans=0.0 2024-10-08 20:17:35,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=448560.0, ans=15.0 2024-10-08 20:17:39,954 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=448560.0, ans=0.125 2024-10-08 20:18:16,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=448800.0, ans=0.0 2024-10-08 20:18:27,039 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2024-10-08 20:18:39,221 INFO [train.py:1136] (0/2) Epoch 46, batch 700, loss[loss=0.1892, simple_loss=0.2843, pruned_loss=0.04705, over 69735.00 frames. ], tot_loss[loss=0.1701, simple_loss=0.2717, pruned_loss=0.0342, over 16557588.09 frames. ], batch size: 1960, lr: 5.55e-03, grad_scale: 8.0 2024-10-08 20:18:52,020 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.540e+02 3.930e+02 4.253e+02 4.850e+02 6.166e+02, threshold=8.506e+02, percent-clipped=0.0 2024-10-08 20:18:54,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=449040.0, ans=0.0 2024-10-08 20:19:00,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=449160.0, ans=0.125 2024-10-08 20:19:00,514 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=449160.0, ans=0.07 2024-10-08 20:19:13,977 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.89 vs. limit=6.0 2024-10-08 20:19:32,529 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=449400.0, ans=0.0 2024-10-08 20:19:35,657 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=449400.0, ans=0.0 2024-10-08 20:19:45,233 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=449520.0, ans=0.0 2024-10-08 20:20:02,191 INFO [train.py:1136] (0/2) Epoch 46, batch 750, loss[loss=0.1796, simple_loss=0.2839, pruned_loss=0.03765, over 85859.00 frames. ], tot_loss[loss=0.17, simple_loss=0.2718, pruned_loss=0.0341, over 16696484.55 frames. ], batch size: 721, lr: 5.55e-03, grad_scale: 8.0 2024-10-08 20:20:11,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=449640.0, ans=0.125 2024-10-08 20:20:26,227 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=449760.0, ans=0.025 2024-10-08 20:20:35,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=449880.0, ans=0.0 2024-10-08 20:20:41,627 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=449880.0, ans=0.125 2024-10-08 20:20:44,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=449880.0, ans=0.125 2024-10-08 20:20:48,189 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449880.0, ans=0.1 2024-10-08 20:20:51,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=450000.0, ans=0.0 2024-10-08 20:21:10,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=450120.0, ans=0.025 2024-10-08 20:21:13,020 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.66 vs. limit=10.0 2024-10-08 20:21:15,742 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=450120.0, ans=0.125 2024-10-08 20:21:26,640 INFO [train.py:1136] (0/2) Epoch 46, batch 800, loss[loss=0.1655, simple_loss=0.2642, pruned_loss=0.03341, over 87351.00 frames. ], tot_loss[loss=0.1703, simple_loss=0.272, pruned_loss=0.03426, over 16795162.58 frames. ], batch size: 296, lr: 5.55e-03, grad_scale: 16.0 2024-10-08 20:21:38,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=450240.0, ans=0.125 2024-10-08 20:21:40,190 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.422e+02 4.007e+02 4.465e+02 5.100e+02 8.543e+02, threshold=8.931e+02, percent-clipped=1.0 2024-10-08 20:21:51,403 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=450360.0, ans=0.0 2024-10-08 20:21:52,257 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-10-08 20:21:53,819 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-46.pt 2024-10-08 20:22:32,835 INFO [train.py:1136] (0/2) Epoch 47, batch 0, loss[loss=0.1644, simple_loss=0.2623, pruned_loss=0.03324, over 86038.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2623, pruned_loss=0.03324, over 86038.00 frames. ], batch size: 197, lr: 5.49e-03, grad_scale: 32.0 2024-10-08 20:22:32,836 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 20:22:43,999 INFO [train.py:1168] (0/2) Epoch 47, validation: loss=0.1679, simple_loss=0.2766, pruned_loss=0.02961, over 1382211.00 frames. 2024-10-08 20:22:44,000 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 20:22:49,809 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=450432.0, ans=0.125 2024-10-08 20:23:02,909 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=450552.0, ans=0.125 2024-10-08 20:23:18,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=450672.0, ans=0.05 2024-10-08 20:23:48,523 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=450792.0, ans=0.0 2024-10-08 20:23:57,101 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 20:24:00,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450912.0, ans=0.1 2024-10-08 20:24:15,313 INFO [train.py:1136] (0/2) Epoch 47, batch 50, loss[loss=0.1779, simple_loss=0.2843, pruned_loss=0.03575, over 84323.00 frames. ], tot_loss[loss=0.1693, simple_loss=0.2714, pruned_loss=0.03356, over 3879987.64 frames. ], batch size: 958, lr: 5.48e-03, grad_scale: 16.0 2024-10-08 20:24:49,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=451152.0, ans=0.0 2024-10-08 20:24:51,351 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=451152.0, ans=0.125 2024-10-08 20:24:54,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=451272.0, ans=0.125 2024-10-08 20:24:59,894 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=451272.0, ans=0.07 2024-10-08 20:25:15,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=451392.0, ans=0.2 2024-10-08 20:25:37,828 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=451512.0, ans=0.1 2024-10-08 20:25:40,905 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.411e+02 4.043e+02 4.474e+02 5.228e+02 8.854e+02, threshold=8.948e+02, percent-clipped=0.0 2024-10-08 20:25:53,062 INFO [train.py:1136] (0/2) Epoch 47, batch 100, loss[loss=0.1611, simple_loss=0.2696, pruned_loss=0.02628, over 87183.00 frames. ], tot_loss[loss=0.1693, simple_loss=0.2711, pruned_loss=0.03378, over 6780789.88 frames. ], batch size: 517, lr: 5.48e-03, grad_scale: 16.0 2024-10-08 20:25:53,924 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2024-10-08 20:26:00,580 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2024-10-08 20:26:46,281 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=451872.0, ans=0.025 2024-10-08 20:26:46,354 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=451872.0, ans=0.125 2024-10-08 20:26:55,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=451992.0, ans=0.0 2024-10-08 20:27:07,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2024-10-08 20:27:08,788 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=452112.0, ans=0.125 2024-10-08 20:27:10,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=452112.0, ans=0.0 2024-10-08 20:27:16,481 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=452112.0, ans=0.025 2024-10-08 20:27:25,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=452112.0, ans=0.125 2024-10-08 20:27:28,527 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=452232.0, ans=0.125 2024-10-08 20:27:28,648 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=452232.0, ans=0.125 2024-10-08 20:27:29,854 INFO [train.py:1136] (0/2) Epoch 47, batch 150, loss[loss=0.1603, simple_loss=0.2616, pruned_loss=0.02947, over 87302.00 frames. ], tot_loss[loss=0.1686, simple_loss=0.2701, pruned_loss=0.0336, over 9081596.75 frames. ], batch size: 264, lr: 5.48e-03, grad_scale: 16.0 2024-10-08 20:27:42,207 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=452232.0, ans=0.035 2024-10-08 20:28:03,728 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=22.5 2024-10-08 20:28:05,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=452472.0, ans=0.05 2024-10-08 20:28:09,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=452472.0, ans=0.1 2024-10-08 20:28:27,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=452592.0, ans=0.125 2024-10-08 20:28:54,079 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.307e+02 3.982e+02 4.465e+02 5.120e+02 9.647e+02, threshold=8.931e+02, percent-clipped=1.0 2024-10-08 20:29:06,045 INFO [train.py:1136] (0/2) Epoch 47, batch 200, loss[loss=0.167, simple_loss=0.2611, pruned_loss=0.03643, over 87278.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2705, pruned_loss=0.03344, over 10867801.36 frames. ], batch size: 280, lr: 5.47e-03, grad_scale: 16.0 2024-10-08 20:29:25,974 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2024-10-08 20:29:32,327 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=452952.0, ans=0.125 2024-10-08 20:30:24,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=453312.0, ans=0.05 2024-10-08 20:30:32,678 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-10-08 20:30:39,281 INFO [train.py:1136] (0/2) Epoch 47, batch 250, loss[loss=0.1634, simple_loss=0.2667, pruned_loss=0.03011, over 87088.00 frames. ], tot_loss[loss=0.1686, simple_loss=0.2703, pruned_loss=0.03343, over 12249074.64 frames. ], batch size: 350, lr: 5.47e-03, grad_scale: 16.0 2024-10-08 20:30:44,737 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=453432.0, ans=0.07 2024-10-08 20:31:22,339 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=453672.0, ans=0.05 2024-10-08 20:31:40,353 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=453792.0, ans=0.125 2024-10-08 20:31:50,312 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=453792.0, ans=0.1 2024-10-08 20:31:58,130 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.336e+02 4.050e+02 4.612e+02 5.485e+02 9.347e+02, threshold=9.224e+02, percent-clipped=1.0 2024-10-08 20:32:00,860 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2024-10-08 20:32:05,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=453912.0, ans=0.2 2024-10-08 20:32:10,079 INFO [train.py:1136] (0/2) Epoch 47, batch 300, loss[loss=0.1703, simple_loss=0.2674, pruned_loss=0.0366, over 87184.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2705, pruned_loss=0.03339, over 13358362.67 frames. ], batch size: 330, lr: 5.47e-03, grad_scale: 16.0 2024-10-08 20:32:13,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=454032.0, ans=0.0 2024-10-08 20:33:17,731 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454392.0, ans=0.125 2024-10-08 20:33:31,939 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=454512.0, ans=10.0 2024-10-08 20:33:42,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=454512.0, ans=0.125 2024-10-08 20:33:45,205 INFO [train.py:1136] (0/2) Epoch 47, batch 350, loss[loss=0.1638, simple_loss=0.2588, pruned_loss=0.03439, over 86049.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2704, pruned_loss=0.03351, over 14200223.22 frames. ], batch size: 197, lr: 5.46e-03, grad_scale: 8.0 2024-10-08 20:33:51,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-10-08 20:34:01,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=454752.0, ans=0.025 2024-10-08 20:35:08,568 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.328e+02 3.918e+02 4.263e+02 4.762e+02 5.435e+02, threshold=8.526e+02, percent-clipped=0.0 2024-10-08 20:35:09,083 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=455112.0, ans=0.125 2024-10-08 20:35:21,196 INFO [train.py:1136] (0/2) Epoch 47, batch 400, loss[loss=0.1669, simple_loss=0.2654, pruned_loss=0.03419, over 86399.00 frames. ], tot_loss[loss=0.1683, simple_loss=0.2701, pruned_loss=0.03328, over 14876932.01 frames. ], batch size: 213, lr: 5.46e-03, grad_scale: 16.0 2024-10-08 20:35:35,771 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=455232.0, ans=0.0 2024-10-08 20:35:46,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=15.0 2024-10-08 20:35:47,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=455352.0, ans=0.025 2024-10-08 20:36:01,819 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=455472.0, ans=0.125 2024-10-08 20:36:21,295 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=455592.0, ans=0.125 2024-10-08 20:36:31,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=455592.0, ans=0.125 2024-10-08 20:36:54,373 INFO [train.py:1136] (0/2) Epoch 47, batch 450, loss[loss=0.1753, simple_loss=0.2794, pruned_loss=0.03557, over 86511.00 frames. ], tot_loss[loss=0.1689, simple_loss=0.2706, pruned_loss=0.03364, over 15394685.50 frames. ], batch size: 668, lr: 5.46e-03, grad_scale: 8.0 2024-10-08 20:37:01,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=455832.0, ans=0.0 2024-10-08 20:37:15,700 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=455952.0, ans=0.5 2024-10-08 20:37:22,878 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455952.0, ans=0.1 2024-10-08 20:37:42,232 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=456072.0, ans=0.0 2024-10-08 20:37:52,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=456192.0, ans=0.125 2024-10-08 20:37:59,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=456192.0, ans=0.125 2024-10-08 20:38:18,494 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.508e+02 4.084e+02 4.548e+02 5.252e+02 1.239e+03, threshold=9.095e+02, percent-clipped=2.0 2024-10-08 20:38:27,075 INFO [train.py:1136] (0/2) Epoch 47, batch 500, loss[loss=0.1753, simple_loss=0.2814, pruned_loss=0.03466, over 85536.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2704, pruned_loss=0.03346, over 15791751.83 frames. ], batch size: 787, lr: 5.45e-03, grad_scale: 8.0 2024-10-08 20:38:33,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=456432.0, ans=0.125 2024-10-08 20:38:57,010 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=456552.0, ans=0.125 2024-10-08 20:40:03,378 INFO [train.py:1136] (0/2) Epoch 47, batch 550, loss[loss=0.1615, simple_loss=0.267, pruned_loss=0.02798, over 87337.00 frames. ], tot_loss[loss=0.1688, simple_loss=0.2707, pruned_loss=0.03347, over 16095094.35 frames. ], batch size: 415, lr: 5.45e-03, grad_scale: 8.0 2024-10-08 20:40:34,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=457152.0, ans=0.025 2024-10-08 20:41:27,279 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=457512.0, ans=0.125 2024-10-08 20:41:28,457 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.436e+02 3.908e+02 4.427e+02 5.082e+02 6.858e+02, threshold=8.854e+02, percent-clipped=0.0 2024-10-08 20:41:31,448 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-10-08 20:41:35,988 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=457632.0, ans=0.0 2024-10-08 20:41:37,232 INFO [train.py:1136] (0/2) Epoch 47, batch 600, loss[loss=0.175, simple_loss=0.2781, pruned_loss=0.03597, over 86426.00 frames. ], tot_loss[loss=0.1685, simple_loss=0.2701, pruned_loss=0.03345, over 16350005.11 frames. ], batch size: 620, lr: 5.44e-03, grad_scale: 8.0 2024-10-08 20:42:09,901 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=22.5 2024-10-08 20:42:10,822 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=457752.0, ans=0.125 2024-10-08 20:42:19,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=457872.0, ans=0.025 2024-10-08 20:42:20,407 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.93 vs. limit=15.0 2024-10-08 20:42:48,154 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=457992.0, ans=0.1 2024-10-08 20:42:58,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=458112.0, ans=0.1 2024-10-08 20:43:00,423 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=458112.0, ans=0.125 2024-10-08 20:43:05,778 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 20:43:13,969 INFO [train.py:1136] (0/2) Epoch 47, batch 650, loss[loss=0.1615, simple_loss=0.2627, pruned_loss=0.03016, over 87408.00 frames. ], tot_loss[loss=0.1685, simple_loss=0.27, pruned_loss=0.0335, over 16460509.85 frames. ], batch size: 393, lr: 5.44e-03, grad_scale: 8.0 2024-10-08 20:43:14,849 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2024-10-08 20:43:38,363 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=15.0 2024-10-08 20:43:49,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=458352.0, ans=0.1 2024-10-08 20:44:31,433 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=458712.0, ans=0.125 2024-10-08 20:44:32,516 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.487e+02 4.012e+02 4.464e+02 5.191e+02 9.863e+02, threshold=8.927e+02, percent-clipped=1.0 2024-10-08 20:44:41,282 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=458832.0, ans=0.2 2024-10-08 20:44:42,520 INFO [train.py:1136] (0/2) Epoch 47, batch 700, loss[loss=0.1593, simple_loss=0.2634, pruned_loss=0.02758, over 87425.00 frames. ], tot_loss[loss=0.1684, simple_loss=0.2699, pruned_loss=0.03343, over 16598145.33 frames. ], batch size: 393, lr: 5.44e-03, grad_scale: 8.0 2024-10-08 20:44:47,747 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=458832.0, ans=0.0 2024-10-08 20:44:52,567 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=458832.0, ans=0.0 2024-10-08 20:44:54,056 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=458832.0, ans=0.1 2024-10-08 20:45:16,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=459072.0, ans=0.125 2024-10-08 20:45:38,095 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.87 vs. limit=15.0 2024-10-08 20:45:40,770 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=22.5 2024-10-08 20:46:05,328 INFO [train.py:1136] (0/2) Epoch 47, batch 750, loss[loss=0.1769, simple_loss=0.283, pruned_loss=0.03542, over 83528.00 frames. ], tot_loss[loss=0.1684, simple_loss=0.2701, pruned_loss=0.03333, over 16709195.93 frames. ], batch size: 1077, lr: 5.43e-03, grad_scale: 8.0 2024-10-08 20:46:10,452 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459432.0, ans=0.125 2024-10-08 20:46:16,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=459432.0, ans=0.0 2024-10-08 20:46:37,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=459672.0, ans=0.0 2024-10-08 20:46:39,458 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.20 vs. limit=10.0 2024-10-08 20:46:48,377 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.39 vs. limit=12.0 2024-10-08 20:46:52,294 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=459672.0, ans=0.125 2024-10-08 20:47:01,991 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 20:47:19,598 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.306e+02 4.348e+02 4.781e+02 5.422e+02 8.175e+02, threshold=9.562e+02, percent-clipped=0.0 2024-10-08 20:47:28,111 INFO [train.py:1136] (0/2) Epoch 47, batch 800, loss[loss=0.1748, simple_loss=0.2747, pruned_loss=0.03749, over 86915.00 frames. ], tot_loss[loss=0.17, simple_loss=0.2717, pruned_loss=0.0342, over 16726149.26 frames. ], batch size: 548, lr: 5.43e-03, grad_scale: 16.0 2024-10-08 20:47:36,572 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=460032.0, ans=0.2 2024-10-08 20:47:53,788 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-47.pt 2024-10-08 20:48:31,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=460224.0, ans=0.125 2024-10-08 20:48:33,146 INFO [train.py:1136] (0/2) Epoch 48, batch 0, loss[loss=0.1758, simple_loss=0.2731, pruned_loss=0.03926, over 87019.00 frames. ], tot_loss[loss=0.1758, simple_loss=0.2731, pruned_loss=0.03926, over 87019.00 frames. ], batch size: 548, lr: 5.37e-03, grad_scale: 16.0 2024-10-08 20:48:33,147 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 20:48:44,336 INFO [train.py:1168] (0/2) Epoch 48, validation: loss=0.1661, simple_loss=0.274, pruned_loss=0.02915, over 1382211.00 frames. 2024-10-08 20:48:44,336 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 20:49:23,818 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.83 vs. limit=15.0 2024-10-08 20:49:36,806 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=460464.0, ans=0.0 2024-10-08 20:49:47,410 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2024-10-08 20:50:03,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=460704.0, ans=0.0 2024-10-08 20:50:19,923 INFO [train.py:1136] (0/2) Epoch 48, batch 50, loss[loss=0.1675, simple_loss=0.2705, pruned_loss=0.03221, over 87286.00 frames. ], tot_loss[loss=0.1692, simple_loss=0.2704, pruned_loss=0.03398, over 3902533.56 frames. ], batch size: 439, lr: 5.37e-03, grad_scale: 16.0 2024-10-08 20:50:53,803 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460944.0, ans=0.1 2024-10-08 20:50:59,677 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=461064.0, ans=0.2 2024-10-08 20:51:01,830 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-10-08 20:51:14,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=461184.0, ans=0.0 2024-10-08 20:51:16,259 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.323e+02 3.957e+02 4.416e+02 4.796e+02 6.484e+02, threshold=8.832e+02, percent-clipped=0.0 2024-10-08 20:51:21,720 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=461184.0, ans=0.1 2024-10-08 20:51:21,793 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=461184.0, ans=0.125 2024-10-08 20:51:53,059 INFO [train.py:1136] (0/2) Epoch 48, batch 100, loss[loss=0.166, simple_loss=0.2663, pruned_loss=0.03291, over 87193.00 frames. ], tot_loss[loss=0.1686, simple_loss=0.2705, pruned_loss=0.03337, over 6858444.22 frames. ], batch size: 330, lr: 5.37e-03, grad_scale: 16.0 2024-10-08 20:51:59,336 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-10-08 20:52:08,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=461544.0, ans=0.015 2024-10-08 20:52:12,559 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=461544.0, ans=0.1 2024-10-08 20:52:20,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=461544.0, ans=0.125 2024-10-08 20:52:39,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=461664.0, ans=0.125 2024-10-08 20:52:46,696 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.23 vs. limit=22.5 2024-10-08 20:52:51,931 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-10-08 20:53:23,964 INFO [train.py:1136] (0/2) Epoch 48, batch 150, loss[loss=0.1732, simple_loss=0.277, pruned_loss=0.03468, over 86432.00 frames. ], tot_loss[loss=0.1681, simple_loss=0.2694, pruned_loss=0.03339, over 9168450.82 frames. ], batch size: 667, lr: 5.36e-03, grad_scale: 16.0 2024-10-08 20:53:26,198 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462024.0, ans=0.1 2024-10-08 20:53:26,251 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=462024.0, ans=0.025 2024-10-08 20:53:28,499 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2024-10-08 20:53:31,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462024.0, ans=0.1 2024-10-08 20:53:57,775 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=462144.0, ans=0.05 2024-10-08 20:54:04,937 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=462264.0, ans=0.0 2024-10-08 20:54:04,952 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=462264.0, ans=0.2 2024-10-08 20:54:10,305 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=462264.0, ans=0.025 2024-10-08 20:54:21,840 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.224e+02 3.944e+02 4.363e+02 4.908e+02 7.072e+02, threshold=8.726e+02, percent-clipped=0.0 2024-10-08 20:54:29,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=462384.0, ans=0.025 2024-10-08 20:55:00,768 INFO [train.py:1136] (0/2) Epoch 48, batch 200, loss[loss=0.1699, simple_loss=0.2692, pruned_loss=0.0353, over 87225.00 frames. ], tot_loss[loss=0.1686, simple_loss=0.2701, pruned_loss=0.03352, over 10923185.80 frames. ], batch size: 296, lr: 5.36e-03, grad_scale: 16.0 2024-10-08 20:55:47,306 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=462864.0, ans=0.125 2024-10-08 20:56:37,152 INFO [train.py:1136] (0/2) Epoch 48, batch 250, loss[loss=0.1586, simple_loss=0.2641, pruned_loss=0.02653, over 87370.00 frames. ], tot_loss[loss=0.168, simple_loss=0.2697, pruned_loss=0.03315, over 12326640.18 frames. ], batch size: 439, lr: 5.36e-03, grad_scale: 16.0 2024-10-08 20:56:42,990 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=463224.0, ans=0.2 2024-10-08 20:57:00,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463344.0, ans=0.1 2024-10-08 20:57:16,668 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=463464.0, ans=0.125 2024-10-08 20:57:18,875 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=22.5 2024-10-08 20:57:20,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-10-08 20:57:34,033 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.156e+02 3.932e+02 4.400e+02 4.969e+02 1.095e+03, threshold=8.801e+02, percent-clipped=1.0 2024-10-08 20:58:02,373 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 20:58:08,162 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=463704.0, ans=0.125 2024-10-08 20:58:11,215 INFO [train.py:1136] (0/2) Epoch 48, batch 300, loss[loss=0.1605, simple_loss=0.2646, pruned_loss=0.02819, over 87401.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.2695, pruned_loss=0.03312, over 13396103.70 frames. ], batch size: 439, lr: 5.35e-03, grad_scale: 16.0 2024-10-08 20:59:10,155 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=464184.0, ans=0.125 2024-10-08 20:59:40,241 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=464304.0, ans=0.0 2024-10-08 20:59:44,926 INFO [train.py:1136] (0/2) Epoch 48, batch 350, loss[loss=0.1588, simple_loss=0.2651, pruned_loss=0.02623, over 87337.00 frames. ], tot_loss[loss=0.1678, simple_loss=0.2697, pruned_loss=0.03297, over 14252218.22 frames. ], batch size: 464, lr: 5.35e-03, grad_scale: 16.0 2024-10-08 21:00:24,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-10-08 21:00:30,841 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=464664.0, ans=0.09899494936611666 2024-10-08 21:00:45,466 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.452e+02 3.852e+02 4.223e+02 4.808e+02 7.476e+02, threshold=8.447e+02, percent-clipped=0.0 2024-10-08 21:01:08,795 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=464904.0, ans=0.95 2024-10-08 21:01:22,556 INFO [train.py:1136] (0/2) Epoch 48, batch 400, loss[loss=0.1788, simple_loss=0.2815, pruned_loss=0.03807, over 86266.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.2697, pruned_loss=0.03305, over 14886653.07 frames. ], batch size: 620, lr: 5.35e-03, grad_scale: 32.0 2024-10-08 21:01:24,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=465024.0, ans=0.125 2024-10-08 21:01:46,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465144.0, ans=0.1 2024-10-08 21:02:02,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465264.0, ans=0.1 2024-10-08 21:02:27,891 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2024-10-08 21:02:32,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.51 vs. limit=15.0 2024-10-08 21:02:56,144 INFO [train.py:1136] (0/2) Epoch 48, batch 450, loss[loss=0.1841, simple_loss=0.2911, pruned_loss=0.0386, over 81743.00 frames. ], tot_loss[loss=0.1683, simple_loss=0.2701, pruned_loss=0.03324, over 15407946.13 frames. ], batch size: 1245, lr: 5.34e-03, grad_scale: 16.0 2024-10-08 21:02:56,564 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=465624.0, ans=0.0 2024-10-08 21:03:07,579 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=465624.0, ans=0.1 2024-10-08 21:03:15,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465744.0, ans=0.1 2024-10-08 21:03:22,958 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=465744.0, ans=0.0 2024-10-08 21:03:37,108 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465864.0, ans=0.1 2024-10-08 21:03:42,781 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=22.5 2024-10-08 21:03:54,054 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.398e+02 3.926e+02 4.283e+02 4.888e+02 7.553e+02, threshold=8.566e+02, percent-clipped=0.0 2024-10-08 21:04:11,619 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2024-10-08 21:04:19,515 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=466104.0, ans=0.125 2024-10-08 21:04:23,449 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.89 vs. limit=15.0 2024-10-08 21:04:28,148 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=466224.0, ans=0.125 2024-10-08 21:04:29,486 INFO [train.py:1136] (0/2) Epoch 48, batch 500, loss[loss=0.1569, simple_loss=0.2554, pruned_loss=0.02925, over 86766.00 frames. ], tot_loss[loss=0.1685, simple_loss=0.2702, pruned_loss=0.03336, over 15773701.56 frames. ], batch size: 246, lr: 5.34e-03, grad_scale: 16.0 2024-10-08 21:04:36,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=466224.0, ans=0.125 2024-10-08 21:04:48,152 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=466344.0, ans=0.0 2024-10-08 21:04:57,290 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=466344.0, ans=0.125 2024-10-08 21:05:18,112 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2024-10-08 21:05:30,520 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466584.0, ans=0.0 2024-10-08 21:05:32,791 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-10-08 21:05:34,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=466584.0, ans=0.125 2024-10-08 21:05:35,774 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=466584.0, ans=0.125 2024-10-08 21:05:44,618 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=466704.0, ans=0.2 2024-10-08 21:06:02,807 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2024-10-08 21:06:05,164 INFO [train.py:1136] (0/2) Epoch 48, batch 550, loss[loss=0.176, simple_loss=0.2787, pruned_loss=0.03667, over 86481.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2704, pruned_loss=0.03347, over 16073742.04 frames. ], batch size: 620, lr: 5.34e-03, grad_scale: 16.0 2024-10-08 21:06:05,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=466824.0, ans=0.125 2024-10-08 21:06:12,986 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.52 vs. limit=10.0 2024-10-08 21:06:16,143 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=466824.0, ans=0.025 2024-10-08 21:06:47,528 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=12.0 2024-10-08 21:07:09,349 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.403e+02 4.173e+02 4.919e+02 5.663e+02 1.745e+03, threshold=9.837e+02, percent-clipped=2.0 2024-10-08 21:07:31,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=467304.0, ans=0.0 2024-10-08 21:07:45,299 INFO [train.py:1136] (0/2) Epoch 48, batch 600, loss[loss=0.1768, simple_loss=0.2806, pruned_loss=0.03645, over 86023.00 frames. ], tot_loss[loss=0.1689, simple_loss=0.2707, pruned_loss=0.03358, over 16253824.01 frames. ], batch size: 721, lr: 5.33e-03, grad_scale: 16.0 2024-10-08 21:07:53,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=467424.0, ans=0.0 2024-10-08 21:08:01,505 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=22.5 2024-10-08 21:08:04,659 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=467544.0, ans=0.125 2024-10-08 21:08:12,060 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=467544.0, ans=0.125 2024-10-08 21:09:19,972 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=467904.0, ans=0.0 2024-10-08 21:09:23,033 INFO [train.py:1136] (0/2) Epoch 48, batch 650, loss[loss=0.1542, simple_loss=0.2626, pruned_loss=0.02291, over 87175.00 frames. ], tot_loss[loss=0.1691, simple_loss=0.271, pruned_loss=0.0336, over 16396629.58 frames. ], batch size: 517, lr: 5.33e-03, grad_scale: 16.0 2024-10-08 21:10:21,923 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.389e+02 4.052e+02 4.350e+02 5.217e+02 7.898e+02, threshold=8.700e+02, percent-clipped=0.0 2024-10-08 21:10:33,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468384.0, ans=0.1 2024-10-08 21:10:34,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=468384.0, ans=0.0 2024-10-08 21:10:55,981 INFO [train.py:1136] (0/2) Epoch 48, batch 700, loss[loss=0.1604, simple_loss=0.2667, pruned_loss=0.02712, over 87434.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2704, pruned_loss=0.03349, over 16572697.56 frames. ], batch size: 464, lr: 5.33e-03, grad_scale: 16.0 2024-10-08 21:11:19,163 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=468744.0, ans=0.025 2024-10-08 21:11:19,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-10-08 21:11:27,465 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=468744.0, ans=0.2 2024-10-08 21:11:36,812 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=468864.0, ans=0.125 2024-10-08 21:11:57,628 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=468984.0, ans=0.125 2024-10-08 21:12:19,384 INFO [train.py:1136] (0/2) Epoch 48, batch 750, loss[loss=0.1665, simple_loss=0.2636, pruned_loss=0.03469, over 86623.00 frames. ], tot_loss[loss=0.1691, simple_loss=0.2707, pruned_loss=0.03371, over 16663985.70 frames. ], batch size: 229, lr: 5.32e-03, grad_scale: 16.0 2024-10-08 21:12:24,560 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 21:12:29,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469224.0, ans=0.1 2024-10-08 21:12:51,518 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=469344.0, ans=0.125 2024-10-08 21:13:09,116 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=469584.0, ans=0.125 2024-10-08 21:13:11,771 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.448e+02 4.080e+02 4.515e+02 5.201e+02 9.522e+02, threshold=9.030e+02, percent-clipped=1.0 2024-10-08 21:13:15,338 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=469584.0, ans=0.035 2024-10-08 21:13:26,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=469704.0, ans=0.07 2024-10-08 21:13:43,946 INFO [train.py:1136] (0/2) Epoch 48, batch 800, loss[loss=0.1601, simple_loss=0.2683, pruned_loss=0.02589, over 87203.00 frames. ], tot_loss[loss=0.1689, simple_loss=0.2707, pruned_loss=0.03353, over 16705812.33 frames. ], batch size: 517, lr: 5.32e-03, grad_scale: 32.0 2024-10-08 21:13:44,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=469824.0, ans=0.0 2024-10-08 21:13:44,425 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=469824.0, ans=0.05 2024-10-08 21:14:03,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=469944.0, ans=0.0 2024-10-08 21:14:10,910 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-48.pt 2024-10-08 21:15:02,456 INFO [train.py:1136] (0/2) Epoch 49, batch 0, loss[loss=0.1618, simple_loss=0.2661, pruned_loss=0.02878, over 87301.00 frames. ], tot_loss[loss=0.1618, simple_loss=0.2661, pruned_loss=0.02878, over 87301.00 frames. ], batch size: 393, lr: 5.26e-03, grad_scale: 32.0 2024-10-08 21:15:02,458 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 21:15:13,673 INFO [train.py:1168] (0/2) Epoch 49, validation: loss=0.1674, simple_loss=0.2755, pruned_loss=0.02963, over 1382211.00 frames. 2024-10-08 21:15:13,674 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 21:15:21,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-10-08 21:15:34,015 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-10-08 21:15:55,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=470256.0, ans=0.5 2024-10-08 21:16:32,740 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=470496.0, ans=0.125 2024-10-08 21:16:44,579 INFO [train.py:1136] (0/2) Epoch 49, batch 50, loss[loss=0.1619, simple_loss=0.2675, pruned_loss=0.02818, over 87252.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2709, pruned_loss=0.0332, over 3846300.54 frames. ], batch size: 415, lr: 5.26e-03, grad_scale: 32.0 2024-10-08 21:16:59,560 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.57 vs. limit=10.0 2024-10-08 21:17:10,289 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.330e+02 3.997e+02 4.579e+02 5.300e+02 7.583e+02, threshold=9.159e+02, percent-clipped=0.0 2024-10-08 21:18:12,915 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=471096.0, ans=0.0 2024-10-08 21:18:16,035 INFO [train.py:1136] (0/2) Epoch 49, batch 100, loss[loss=0.172, simple_loss=0.2697, pruned_loss=0.03721, over 87252.00 frames. ], tot_loss[loss=0.1665, simple_loss=0.2678, pruned_loss=0.03263, over 6824130.33 frames. ], batch size: 296, lr: 5.26e-03, grad_scale: 32.0 2024-10-08 21:18:32,661 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=471216.0, ans=0.125 2024-10-08 21:19:31,328 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2024-10-08 21:19:38,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=471696.0, ans=0.07 2024-10-08 21:19:45,893 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=471696.0, ans=0.0 2024-10-08 21:19:52,013 INFO [train.py:1136] (0/2) Epoch 49, batch 150, loss[loss=0.1718, simple_loss=0.2707, pruned_loss=0.03641, over 87056.00 frames. ], tot_loss[loss=0.1665, simple_loss=0.2677, pruned_loss=0.03265, over 9129563.79 frames. ], batch size: 350, lr: 5.25e-03, grad_scale: 32.0 2024-10-08 21:19:56,290 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 21:19:58,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-10-08 21:20:03,635 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=471816.0, ans=0.125 2024-10-08 21:20:20,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=471936.0, ans=0.125 2024-10-08 21:20:21,237 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.398e+02 3.917e+02 4.359e+02 5.004e+02 7.244e+02, threshold=8.717e+02, percent-clipped=0.0 2024-10-08 21:21:19,693 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=472296.0, ans=0.125 2024-10-08 21:21:24,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472296.0, ans=0.1 2024-10-08 21:21:29,726 INFO [train.py:1136] (0/2) Epoch 49, batch 200, loss[loss=0.1773, simple_loss=0.2825, pruned_loss=0.03605, over 83486.00 frames. ], tot_loss[loss=0.1676, simple_loss=0.2688, pruned_loss=0.03319, over 10847379.02 frames. ], batch size: 1078, lr: 5.25e-03, grad_scale: 32.0 2024-10-08 21:21:57,734 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=472536.0, ans=0.025 2024-10-08 21:22:00,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=472536.0, ans=0.035 2024-10-08 21:22:19,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=472656.0, ans=0.125 2024-10-08 21:22:42,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=472896.0, ans=0.0 2024-10-08 21:23:03,507 INFO [train.py:1136] (0/2) Epoch 49, batch 250, loss[loss=0.1834, simple_loss=0.2834, pruned_loss=0.04164, over 69070.00 frames. ], tot_loss[loss=0.1675, simple_loss=0.2688, pruned_loss=0.03307, over 12249285.14 frames. ], batch size: 1960, lr: 5.25e-03, grad_scale: 16.0 2024-10-08 21:23:05,950 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473016.0, ans=0.1 2024-10-08 21:23:18,861 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=473016.0, ans=0.125 2024-10-08 21:23:30,576 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=473136.0, ans=0.0 2024-10-08 21:23:33,556 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.514e+02 3.985e+02 4.430e+02 4.954e+02 7.566e+02, threshold=8.860e+02, percent-clipped=0.0 2024-10-08 21:23:39,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=473256.0, ans=0.035 2024-10-08 21:23:53,846 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=473256.0, ans=0.1 2024-10-08 21:24:02,098 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2024-10-08 21:24:08,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=473376.0, ans=0.04949747468305833 2024-10-08 21:24:38,001 INFO [train.py:1136] (0/2) Epoch 49, batch 300, loss[loss=0.1806, simple_loss=0.2855, pruned_loss=0.03779, over 85368.00 frames. ], tot_loss[loss=0.1684, simple_loss=0.27, pruned_loss=0.03339, over 13285513.47 frames. ], batch size: 786, lr: 5.24e-03, grad_scale: 16.0 2024-10-08 21:25:37,316 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=473976.0, ans=0.125 2024-10-08 21:25:40,641 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=473976.0, ans=0.1 2024-10-08 21:26:08,495 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474096.0, ans=0.1 2024-10-08 21:26:11,967 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=474216.0, ans=0.0 2024-10-08 21:26:13,269 INFO [train.py:1136] (0/2) Epoch 49, batch 350, loss[loss=0.1809, simple_loss=0.2829, pruned_loss=0.03949, over 86245.00 frames. ], tot_loss[loss=0.1678, simple_loss=0.269, pruned_loss=0.03334, over 14171885.95 frames. ], batch size: 620, lr: 5.24e-03, grad_scale: 16.0 2024-10-08 21:26:43,451 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.489e+02 3.928e+02 4.420e+02 4.922e+02 6.837e+02, threshold=8.841e+02, percent-clipped=0.0 2024-10-08 21:27:50,581 INFO [train.py:1136] (0/2) Epoch 49, batch 400, loss[loss=0.1821, simple_loss=0.2797, pruned_loss=0.04229, over 69352.00 frames. ], tot_loss[loss=0.168, simple_loss=0.2693, pruned_loss=0.03338, over 14807440.74 frames. ], batch size: 1960, lr: 5.24e-03, grad_scale: 16.0 2024-10-08 21:28:02,159 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=474816.0, ans=0.125 2024-10-08 21:28:03,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=474816.0, ans=0.0 2024-10-08 21:28:11,576 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2024-10-08 21:28:26,578 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475056.0, ans=0.1 2024-10-08 21:28:26,733 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=475056.0, ans=0.2 2024-10-08 21:29:20,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=475296.0, ans=0.125 2024-10-08 21:29:27,853 INFO [train.py:1136] (0/2) Epoch 49, batch 450, loss[loss=0.1592, simple_loss=0.2651, pruned_loss=0.02669, over 87391.00 frames. ], tot_loss[loss=0.1682, simple_loss=0.2693, pruned_loss=0.03353, over 15325778.22 frames. ], batch size: 464, lr: 5.23e-03, grad_scale: 4.0 2024-10-08 21:29:57,772 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=475536.0, ans=0.125 2024-10-08 21:30:02,476 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.509e+02 4.357e+02 4.797e+02 5.428e+02 6.575e+02, threshold=9.594e+02, percent-clipped=0.0 2024-10-08 21:30:06,587 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=475656.0, ans=0.0 2024-10-08 21:30:25,515 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-10-08 21:31:01,216 INFO [train.py:1136] (0/2) Epoch 49, batch 500, loss[loss=0.1622, simple_loss=0.2646, pruned_loss=0.02995, over 87358.00 frames. ], tot_loss[loss=0.1683, simple_loss=0.2699, pruned_loss=0.03339, over 15727544.43 frames. ], batch size: 372, lr: 5.23e-03, grad_scale: 8.0 2024-10-08 21:31:05,033 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=476016.0, ans=0.125 2024-10-08 21:31:05,131 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=476016.0, ans=0.125 2024-10-08 21:31:33,460 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=476136.0, ans=0.125 2024-10-08 21:31:37,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=8.0 2024-10-08 21:32:01,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=476376.0, ans=0.125 2024-10-08 21:32:01,848 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-10-08 21:32:18,752 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476496.0, ans=0.1 2024-10-08 21:32:27,008 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-10-08 21:32:31,575 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=476496.0, ans=0.125 2024-10-08 21:32:36,305 INFO [train.py:1136] (0/2) Epoch 49, batch 550, loss[loss=0.176, simple_loss=0.2798, pruned_loss=0.03613, over 85423.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2702, pruned_loss=0.03361, over 16047908.41 frames. ], batch size: 787, lr: 5.23e-03, grad_scale: 8.0 2024-10-08 21:32:40,817 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2024-10-08 21:32:59,292 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=476736.0, ans=0.125 2024-10-08 21:33:11,387 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.396e+02 4.065e+02 4.547e+02 4.917e+02 6.939e+02, threshold=9.093e+02, percent-clipped=0.0 2024-10-08 21:33:50,228 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.39 vs. limit=10.0 2024-10-08 21:34:12,482 INFO [train.py:1136] (0/2) Epoch 49, batch 600, loss[loss=0.1764, simple_loss=0.2806, pruned_loss=0.03616, over 86999.00 frames. ], tot_loss[loss=0.1682, simple_loss=0.2697, pruned_loss=0.03339, over 16289507.47 frames. ], batch size: 583, lr: 5.22e-03, grad_scale: 8.0 2024-10-08 21:34:36,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=477336.0, ans=0.0 2024-10-08 21:34:41,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=477336.0, ans=0.0 2024-10-08 21:34:41,499 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=477336.0, ans=0.125 2024-10-08 21:35:25,742 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2024-10-08 21:35:39,704 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=22.5 2024-10-08 21:35:47,640 INFO [train.py:1136] (0/2) Epoch 49, batch 650, loss[loss=0.1612, simple_loss=0.2661, pruned_loss=0.02814, over 87507.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.2691, pruned_loss=0.03339, over 16484377.52 frames. ], batch size: 393, lr: 5.22e-03, grad_scale: 8.0 2024-10-08 21:35:54,211 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=477816.0, ans=0.125 2024-10-08 21:36:23,306 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.585e+02 3.910e+02 4.318e+02 5.207e+02 1.221e+03, threshold=8.636e+02, percent-clipped=1.0 2024-10-08 21:37:08,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-10-08 21:37:15,826 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=478416.0, ans=0.0 2024-10-08 21:37:15,934 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=478416.0, ans=0.125 2024-10-08 21:37:17,131 INFO [train.py:1136] (0/2) Epoch 49, batch 700, loss[loss=0.163, simple_loss=0.2579, pruned_loss=0.03403, over 86150.00 frames. ], tot_loss[loss=0.168, simple_loss=0.2691, pruned_loss=0.03342, over 16626815.68 frames. ], batch size: 197, lr: 5.22e-03, grad_scale: 8.0 2024-10-08 21:37:26,045 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2024-10-08 21:37:50,985 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=478656.0, ans=0.025 2024-10-08 21:37:57,895 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-10-08 21:37:58,748 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=478656.0, ans=0.2 2024-10-08 21:38:14,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=478776.0, ans=0.125 2024-10-08 21:38:39,700 INFO [train.py:1136] (0/2) Epoch 49, batch 750, loss[loss=0.1629, simple_loss=0.2639, pruned_loss=0.03096, over 87040.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2692, pruned_loss=0.03314, over 16738586.00 frames. ], batch size: 350, lr: 5.21e-03, grad_scale: 8.0 2024-10-08 21:38:41,616 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=479016.0, ans=0.2 2024-10-08 21:39:10,004 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.269e+02 3.811e+02 4.211e+02 4.771e+02 1.100e+03, threshold=8.422e+02, percent-clipped=1.0 2024-10-08 21:39:18,731 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.73 vs. limit=6.0 2024-10-08 21:39:38,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=479376.0, ans=0.2 2024-10-08 21:39:57,756 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=15.0 2024-10-08 21:40:03,218 INFO [train.py:1136] (0/2) Epoch 49, batch 800, loss[loss=0.163, simple_loss=0.2652, pruned_loss=0.03042, over 87311.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2693, pruned_loss=0.03304, over 16787820.86 frames. ], batch size: 372, lr: 5.21e-03, grad_scale: 16.0 2024-10-08 21:40:30,141 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-49.pt 2024-10-08 21:41:11,032 INFO [train.py:1136] (0/2) Epoch 50, batch 0, loss[loss=0.1769, simple_loss=0.282, pruned_loss=0.03591, over 81873.00 frames. ], tot_loss[loss=0.1769, simple_loss=0.282, pruned_loss=0.03591, over 81873.00 frames. ], batch size: 1245, lr: 5.16e-03, grad_scale: 32.0 2024-10-08 21:41:11,034 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 21:41:22,263 INFO [train.py:1168] (0/2) Epoch 50, validation: loss=0.1675, simple_loss=0.2757, pruned_loss=0.02964, over 1382211.00 frames. 2024-10-08 21:41:22,264 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 21:41:47,101 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2024-10-08 21:41:49,848 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-40000.pt 2024-10-08 21:42:10,906 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=480048.0, ans=0.0 2024-10-08 21:42:28,838 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=480168.0, ans=0.0 2024-10-08 21:42:35,651 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480168.0, ans=0.1 2024-10-08 21:42:40,141 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.51 vs. limit=15.0 2024-10-08 21:43:01,836 INFO [train.py:1136] (0/2) Epoch 50, batch 50, loss[loss=0.1624, simple_loss=0.2661, pruned_loss=0.02938, over 87412.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.2712, pruned_loss=0.03309, over 3859109.23 frames. ], batch size: 372, lr: 5.15e-03, grad_scale: 32.0 2024-10-08 21:43:07,055 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.449e+02 4.032e+02 4.512e+02 5.256e+02 7.574e+02, threshold=9.024e+02, percent-clipped=0.0 2024-10-08 21:43:56,407 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=480648.0, ans=0.2 2024-10-08 21:44:36,540 INFO [train.py:1136] (0/2) Epoch 50, batch 100, loss[loss=0.201, simple_loss=0.2998, pruned_loss=0.05114, over 78767.00 frames. ], tot_loss[loss=0.1694, simple_loss=0.2717, pruned_loss=0.03355, over 6756483.34 frames. ], batch size: 1493, lr: 5.15e-03, grad_scale: 16.0 2024-10-08 21:44:37,057 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=481008.0, ans=0.0 2024-10-08 21:44:53,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=481008.0, ans=0.0 2024-10-08 21:45:12,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481128.0, ans=0.1 2024-10-08 21:45:14,127 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=481128.0, ans=0.125 2024-10-08 21:45:32,025 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=481248.0, ans=0.125 2024-10-08 21:45:50,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=481368.0, ans=0.025 2024-10-08 21:45:53,920 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=481488.0, ans=0.04949747468305833 2024-10-08 21:46:15,688 INFO [train.py:1136] (0/2) Epoch 50, batch 150, loss[loss=0.1728, simple_loss=0.2794, pruned_loss=0.03308, over 85365.00 frames. ], tot_loss[loss=0.1692, simple_loss=0.2713, pruned_loss=0.0335, over 9040810.85 frames. ], batch size: 787, lr: 5.15e-03, grad_scale: 16.0 2024-10-08 21:46:22,710 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.484e+02 4.019e+02 4.326e+02 5.069e+02 9.164e+02, threshold=8.653e+02, percent-clipped=1.0 2024-10-08 21:47:02,847 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=481848.0, ans=0.125 2024-10-08 21:47:30,184 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-10-08 21:47:36,993 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=4.96 vs. limit=15.0 2024-10-08 21:47:55,377 INFO [train.py:1136] (0/2) Epoch 50, batch 200, loss[loss=0.1902, simple_loss=0.2931, pruned_loss=0.04371, over 81981.00 frames. ], tot_loss[loss=0.1682, simple_loss=0.2702, pruned_loss=0.03311, over 10794141.68 frames. ], batch size: 1245, lr: 5.14e-03, grad_scale: 8.0 2024-10-08 21:48:13,542 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=482328.0, ans=0.0 2024-10-08 21:48:30,746 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482328.0, ans=0.1 2024-10-08 21:49:05,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=482568.0, ans=0.125 2024-10-08 21:49:36,671 INFO [train.py:1136] (0/2) Epoch 50, batch 250, loss[loss=0.18, simple_loss=0.2795, pruned_loss=0.04027, over 87008.00 frames. ], tot_loss[loss=0.1693, simple_loss=0.2709, pruned_loss=0.03387, over 12149582.72 frames. ], batch size: 583, lr: 5.14e-03, grad_scale: 8.0 2024-10-08 21:49:45,506 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.257e+02 3.929e+02 4.493e+02 5.115e+02 7.311e+02, threshold=8.987e+02, percent-clipped=0.0 2024-10-08 21:49:49,463 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=482808.0, ans=0.2 2024-10-08 21:50:30,821 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=483048.0, ans=0.2 2024-10-08 21:50:32,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=483168.0, ans=0.0 2024-10-08 21:50:37,718 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=483168.0, ans=0.0 2024-10-08 21:50:51,896 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=483288.0, ans=0.2 2024-10-08 21:50:59,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-10-08 21:51:08,243 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=483288.0, ans=0.0 2024-10-08 21:51:10,413 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2024-10-08 21:51:11,145 INFO [train.py:1136] (0/2) Epoch 50, batch 300, loss[loss=0.1628, simple_loss=0.2617, pruned_loss=0.03201, over 87158.00 frames. ], tot_loss[loss=0.1685, simple_loss=0.27, pruned_loss=0.03352, over 13268238.36 frames. ], batch size: 330, lr: 5.14e-03, grad_scale: 8.0 2024-10-08 21:51:33,037 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=483528.0, ans=0.0 2024-10-08 21:51:34,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=483528.0, ans=0.125 2024-10-08 21:51:35,239 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=22.5 2024-10-08 21:51:59,712 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=483648.0, ans=0.125 2024-10-08 21:52:01,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=483648.0, ans=0.125 2024-10-08 21:52:39,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=483888.0, ans=0.0 2024-10-08 21:52:47,232 INFO [train.py:1136] (0/2) Epoch 50, batch 350, loss[loss=0.1603, simple_loss=0.2584, pruned_loss=0.03108, over 87191.00 frames. ], tot_loss[loss=0.1685, simple_loss=0.2704, pruned_loss=0.03335, over 14142674.52 frames. ], batch size: 296, lr: 5.14e-03, grad_scale: 8.0 2024-10-08 21:53:00,043 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.414e+02 4.031e+02 4.507e+02 4.814e+02 3.007e+03, threshold=9.013e+02, percent-clipped=1.0 2024-10-08 21:53:18,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=484128.0, ans=0.0 2024-10-08 21:53:30,134 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=484248.0, ans=0.0 2024-10-08 21:53:49,469 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2024-10-08 21:53:54,203 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=484368.0, ans=0.0 2024-10-08 21:54:15,217 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=484488.0, ans=0.0 2024-10-08 21:54:27,077 INFO [train.py:1136] (0/2) Epoch 50, batch 400, loss[loss=0.1595, simple_loss=0.266, pruned_loss=0.02651, over 87348.00 frames. ], tot_loss[loss=0.1682, simple_loss=0.2703, pruned_loss=0.03306, over 14816304.89 frames. ], batch size: 490, lr: 5.13e-03, grad_scale: 16.0 2024-10-08 21:55:09,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484848.0, ans=0.1 2024-10-08 21:55:32,907 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=484968.0, ans=0.125 2024-10-08 21:56:03,074 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=485088.0, ans=0.125 2024-10-08 21:56:07,733 INFO [train.py:1136] (0/2) Epoch 50, batch 450, loss[loss=0.1854, simple_loss=0.2822, pruned_loss=0.04426, over 86986.00 frames. ], tot_loss[loss=0.1685, simple_loss=0.2706, pruned_loss=0.03322, over 15321749.32 frames. ], batch size: 548, lr: 5.13e-03, grad_scale: 16.0 2024-10-08 21:56:18,228 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.427e+02 3.866e+02 4.185e+02 4.581e+02 6.799e+02, threshold=8.371e+02, percent-clipped=0.0 2024-10-08 21:56:51,537 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.24 vs. limit=12.0 2024-10-08 21:57:36,531 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=485688.0, ans=0.1 2024-10-08 21:57:38,715 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=2.73 vs. limit=10.0 2024-10-08 21:57:42,893 INFO [train.py:1136] (0/2) Epoch 50, batch 500, loss[loss=0.1637, simple_loss=0.2644, pruned_loss=0.03145, over 87214.00 frames. ], tot_loss[loss=0.1681, simple_loss=0.2698, pruned_loss=0.03321, over 15738274.48 frames. ], batch size: 313, lr: 5.13e-03, grad_scale: 16.0 2024-10-08 21:58:22,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=486048.0, ans=0.2 2024-10-08 21:58:50,584 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=486168.0, ans=0.125 2024-10-08 21:59:17,977 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=486288.0, ans=10.0 2024-10-08 21:59:22,664 INFO [train.py:1136] (0/2) Epoch 50, batch 550, loss[loss=0.154, simple_loss=0.2596, pruned_loss=0.02422, over 87412.00 frames. ], tot_loss[loss=0.168, simple_loss=0.2697, pruned_loss=0.03321, over 16044962.25 frames. ], batch size: 464, lr: 5.12e-03, grad_scale: 16.0 2024-10-08 21:59:24,865 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=486408.0, ans=0.025 2024-10-08 21:59:32,792 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.423e+02 3.928e+02 4.256e+02 4.684e+02 7.906e+02, threshold=8.512e+02, percent-clipped=0.0 2024-10-08 21:59:59,869 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.07 vs. limit=6.0 2024-10-08 22:00:54,096 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=486888.0, ans=0.0 2024-10-08 22:00:58,962 INFO [train.py:1136] (0/2) Epoch 50, batch 600, loss[loss=0.1655, simple_loss=0.2641, pruned_loss=0.03341, over 87138.00 frames. ], tot_loss[loss=0.1676, simple_loss=0.2693, pruned_loss=0.03294, over 16313552.61 frames. ], batch size: 330, lr: 5.12e-03, grad_scale: 16.0 2024-10-08 22:01:16,566 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=487128.0, ans=0.0 2024-10-08 22:01:21,923 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487128.0, ans=0.1 2024-10-08 22:02:21,833 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=15.0 2024-10-08 22:02:29,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=487488.0, ans=0.0 2024-10-08 22:02:31,133 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=487488.0, ans=0.125 2024-10-08 22:02:37,664 INFO [train.py:1136] (0/2) Epoch 50, batch 650, loss[loss=0.1603, simple_loss=0.2551, pruned_loss=0.03278, over 86449.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.2697, pruned_loss=0.03303, over 16481490.21 frames. ], batch size: 197, lr: 5.12e-03, grad_scale: 16.0 2024-10-08 22:02:38,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.63 vs. limit=15.0 2024-10-08 22:02:47,839 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.478e+02 3.806e+02 4.288e+02 4.821e+02 7.387e+02, threshold=8.576e+02, percent-clipped=0.0 2024-10-08 22:02:53,159 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 22:03:11,313 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=487848.0, ans=15.0 2024-10-08 22:03:20,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=487848.0, ans=0.0 2024-10-08 22:03:22,122 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487848.0, ans=0.1 2024-10-08 22:04:01,441 INFO [train.py:1136] (0/2) Epoch 50, batch 700, loss[loss=0.1783, simple_loss=0.2847, pruned_loss=0.03595, over 83330.00 frames. ], tot_loss[loss=0.1672, simple_loss=0.2689, pruned_loss=0.03277, over 16657220.23 frames. ], batch size: 1077, lr: 5.11e-03, grad_scale: 16.0 2024-10-08 22:04:09,914 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=488208.0, ans=0.125 2024-10-08 22:04:14,655 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=488208.0, ans=0.0 2024-10-08 22:04:16,673 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=15.0 2024-10-08 22:04:35,913 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-10-08 22:04:40,107 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=488448.0, ans=0.0 2024-10-08 22:04:41,643 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=488448.0, ans=0.0 2024-10-08 22:04:48,009 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=488448.0, ans=0.0 2024-10-08 22:05:13,598 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=488688.0, ans=0.0 2024-10-08 22:05:24,176 INFO [train.py:1136] (0/2) Epoch 50, batch 750, loss[loss=0.1618, simple_loss=0.2601, pruned_loss=0.03172, over 87238.00 frames. ], tot_loss[loss=0.167, simple_loss=0.2687, pruned_loss=0.03266, over 16779285.73 frames. ], batch size: 313, lr: 5.11e-03, grad_scale: 16.0 2024-10-08 22:05:29,197 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=488808.0, ans=0.025 2024-10-08 22:05:35,155 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.362e+02 3.864e+02 4.159e+02 4.558e+02 1.301e+03, threshold=8.318e+02, percent-clipped=1.0 2024-10-08 22:05:43,551 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=488928.0, ans=0.125 2024-10-08 22:06:03,614 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=489048.0, ans=0.125 2024-10-08 22:06:42,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=489288.0, ans=0.2 2024-10-08 22:06:48,607 INFO [train.py:1136] (0/2) Epoch 50, batch 800, loss[loss=0.1813, simple_loss=0.2815, pruned_loss=0.04057, over 69476.00 frames. ], tot_loss[loss=0.1678, simple_loss=0.2694, pruned_loss=0.0331, over 16777566.12 frames. ], batch size: 1960, lr: 5.11e-03, grad_scale: 32.0 2024-10-08 22:06:48,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=489408.0, ans=0.125 2024-10-08 22:06:54,168 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=489408.0, ans=0.125 2024-10-08 22:07:02,557 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-10-08 22:07:14,841 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-50.pt 2024-10-08 22:07:53,414 INFO [train.py:1136] (0/2) Epoch 51, batch 0, loss[loss=0.1629, simple_loss=0.2683, pruned_loss=0.02877, over 87189.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.2683, pruned_loss=0.02877, over 87189.00 frames. ], batch size: 439, lr: 5.06e-03, grad_scale: 32.0 2024-10-08 22:07:53,416 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 22:08:05,431 INFO [train.py:1168] (0/2) Epoch 51, validation: loss=0.1663, simple_loss=0.274, pruned_loss=0.02936, over 1382211.00 frames. 2024-10-08 22:08:05,431 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 22:08:05,921 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489600.0, ans=0.125 2024-10-08 22:08:14,912 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.44 vs. limit=10.0 2024-10-08 22:08:17,868 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=489600.0, ans=0.125 2024-10-08 22:08:47,065 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2024-10-08 22:08:56,710 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=489840.0, ans=0.2 2024-10-08 22:08:58,535 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=489840.0, ans=0.125 2024-10-08 22:09:19,196 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.476e+02 4.073e+02 4.558e+02 5.095e+02 9.193e+02, threshold=9.117e+02, percent-clipped=1.0 2024-10-08 22:09:30,547 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-10-08 22:09:40,473 INFO [train.py:1136] (0/2) Epoch 51, batch 50, loss[loss=0.1697, simple_loss=0.2664, pruned_loss=0.03651, over 87382.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.2669, pruned_loss=0.03186, over 3864245.84 frames. ], batch size: 296, lr: 5.05e-03, grad_scale: 32.0 2024-10-08 22:10:03,319 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=490320.0, ans=0.025 2024-10-08 22:11:11,967 INFO [train.py:1136] (0/2) Epoch 51, batch 100, loss[loss=0.1763, simple_loss=0.2816, pruned_loss=0.03546, over 85217.00 frames. ], tot_loss[loss=0.1657, simple_loss=0.2674, pruned_loss=0.03206, over 6819767.85 frames. ], batch size: 866, lr: 5.05e-03, grad_scale: 32.0 2024-10-08 22:11:18,371 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-10-08 22:11:34,172 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490920.0, ans=0.1 2024-10-08 22:11:44,550 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 22:12:19,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=491160.0, ans=0.2 2024-10-08 22:12:32,752 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.443e+02 3.928e+02 4.397e+02 5.011e+02 7.050e+02, threshold=8.794e+02, percent-clipped=0.0 2024-10-08 22:12:33,261 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=491280.0, ans=0.125 2024-10-08 22:12:40,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=491280.0, ans=0.025 2024-10-08 22:12:40,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=491280.0, ans=0.125 2024-10-08 22:12:45,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=491280.0, ans=0.1 2024-10-08 22:12:49,993 INFO [train.py:1136] (0/2) Epoch 51, batch 150, loss[loss=0.1821, simple_loss=0.2869, pruned_loss=0.03864, over 85342.00 frames. ], tot_loss[loss=0.168, simple_loss=0.2695, pruned_loss=0.03324, over 9092692.14 frames. ], batch size: 786, lr: 5.05e-03, grad_scale: 32.0 2024-10-08 22:13:35,341 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=491640.0, ans=0.07 2024-10-08 22:13:56,739 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=491760.0, ans=0.2 2024-10-08 22:14:06,140 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=491760.0, ans=0.2 2024-10-08 22:14:26,225 INFO [train.py:1136] (0/2) Epoch 51, batch 200, loss[loss=0.1743, simple_loss=0.2821, pruned_loss=0.03321, over 84531.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.2693, pruned_loss=0.03321, over 10901321.48 frames. ], batch size: 958, lr: 5.04e-03, grad_scale: 32.0 2024-10-08 22:14:37,679 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.43 vs. limit=15.0 2024-10-08 22:14:45,593 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-10-08 22:15:36,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.66 vs. limit=15.0 2024-10-08 22:15:40,268 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.458e+02 3.888e+02 4.251e+02 4.741e+02 6.247e+02, threshold=8.502e+02, percent-clipped=0.0 2024-10-08 22:15:44,307 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492480.0, ans=0.1 2024-10-08 22:16:02,631 INFO [train.py:1136] (0/2) Epoch 51, batch 250, loss[loss=0.1607, simple_loss=0.2696, pruned_loss=0.02592, over 87223.00 frames. ], tot_loss[loss=0.1675, simple_loss=0.2689, pruned_loss=0.03307, over 12292439.07 frames. ], batch size: 464, lr: 5.04e-03, grad_scale: 32.0 2024-10-08 22:16:40,887 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-10-08 22:16:45,680 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=492840.0, ans=0.125 2024-10-08 22:16:53,724 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=492840.0, ans=0.025 2024-10-08 22:16:53,975 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-10-08 22:17:32,856 INFO [train.py:1136] (0/2) Epoch 51, batch 300, loss[loss=0.156, simple_loss=0.256, pruned_loss=0.02804, over 86654.00 frames. ], tot_loss[loss=0.1676, simple_loss=0.2691, pruned_loss=0.03304, over 13367893.21 frames. ], batch size: 229, lr: 5.04e-03, grad_scale: 32.0 2024-10-08 22:18:01,749 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.57 vs. limit=15.0 2024-10-08 22:18:09,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=493320.0, ans=0.125 2024-10-08 22:18:09,686 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=493320.0, ans=0.2 2024-10-08 22:18:54,062 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.533e+02 4.002e+02 4.468e+02 4.868e+02 1.195e+03, threshold=8.935e+02, percent-clipped=2.0 2024-10-08 22:18:58,220 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=493680.0, ans=0.0 2024-10-08 22:19:09,772 INFO [train.py:1136] (0/2) Epoch 51, batch 350, loss[loss=0.1661, simple_loss=0.266, pruned_loss=0.03304, over 87161.00 frames. ], tot_loss[loss=0.1675, simple_loss=0.2689, pruned_loss=0.03305, over 14208267.74 frames. ], batch size: 330, lr: 5.04e-03, grad_scale: 16.0 2024-10-08 22:19:56,331 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=494040.0, ans=0.05 2024-10-08 22:20:05,001 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=494160.0, ans=0.1 2024-10-08 22:20:18,580 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=494160.0, ans=0.2 2024-10-08 22:20:45,584 INFO [train.py:1136] (0/2) Epoch 51, batch 400, loss[loss=0.1579, simple_loss=0.2614, pruned_loss=0.0272, over 87398.00 frames. ], tot_loss[loss=0.167, simple_loss=0.2686, pruned_loss=0.0327, over 14878589.51 frames. ], batch size: 415, lr: 5.03e-03, grad_scale: 32.0 2024-10-08 22:20:51,956 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.80 vs. limit=15.0 2024-10-08 22:21:03,842 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=494520.0, ans=0.04949747468305833 2024-10-08 22:21:13,374 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.69 vs. limit=15.0 2024-10-08 22:22:04,930 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.436e+02 3.966e+02 4.326e+02 4.941e+02 7.128e+02, threshold=8.653e+02, percent-clipped=0.0 2024-10-08 22:22:22,588 INFO [train.py:1136] (0/2) Epoch 51, batch 450, loss[loss=0.1653, simple_loss=0.2629, pruned_loss=0.03387, over 87096.00 frames. ], tot_loss[loss=0.1673, simple_loss=0.269, pruned_loss=0.03283, over 15349455.47 frames. ], batch size: 350, lr: 5.03e-03, grad_scale: 32.0 2024-10-08 22:22:56,183 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=495120.0, ans=0.025 2024-10-08 22:23:15,552 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=495240.0, ans=0.0 2024-10-08 22:23:29,167 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=495360.0, ans=0.2 2024-10-08 22:23:39,594 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495480.0, ans=0.125 2024-10-08 22:23:46,546 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=495480.0, ans=0.0 2024-10-08 22:23:55,254 INFO [train.py:1136] (0/2) Epoch 51, batch 500, loss[loss=0.1616, simple_loss=0.2572, pruned_loss=0.03296, over 86626.00 frames. ], tot_loss[loss=0.1675, simple_loss=0.269, pruned_loss=0.03301, over 15767311.72 frames. ], batch size: 213, lr: 5.03e-03, grad_scale: 32.0 2024-10-08 22:24:02,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=495600.0, ans=0.125 2024-10-08 22:24:22,933 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=495720.0, ans=0.0 2024-10-08 22:24:40,944 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 22:24:44,901 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=495840.0, ans=12.0 2024-10-08 22:25:13,934 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.560e+02 4.271e+02 5.046e+02 5.758e+02 8.274e+02, threshold=1.009e+03, percent-clipped=0.0 2024-10-08 22:25:16,065 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=496080.0, ans=0.0 2024-10-08 22:25:31,851 INFO [train.py:1136] (0/2) Epoch 51, batch 550, loss[loss=0.1731, simple_loss=0.2768, pruned_loss=0.03464, over 85880.00 frames. ], tot_loss[loss=0.1673, simple_loss=0.2688, pruned_loss=0.03292, over 16070131.86 frames. ], batch size: 721, lr: 5.02e-03, grad_scale: 32.0 2024-10-08 22:26:01,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=496320.0, ans=0.0 2024-10-08 22:26:21,031 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=496440.0, ans=0.0 2024-10-08 22:27:04,984 INFO [train.py:1136] (0/2) Epoch 51, batch 600, loss[loss=0.1538, simple_loss=0.2537, pruned_loss=0.02695, over 87016.00 frames. ], tot_loss[loss=0.1666, simple_loss=0.2682, pruned_loss=0.03255, over 16312361.83 frames. ], batch size: 264, lr: 5.02e-03, grad_scale: 32.0 2024-10-08 22:27:32,153 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=496920.0, ans=0.125 2024-10-08 22:27:41,404 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-10-08 22:27:54,592 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=497040.0, ans=0.125 2024-10-08 22:28:02,099 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497160.0, ans=0.1 2024-10-08 22:28:12,376 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=497160.0, ans=0.125 2024-10-08 22:28:22,004 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2024-10-08 22:28:22,706 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.401e+02 3.871e+02 4.153e+02 4.550e+02 6.002e+02, threshold=8.306e+02, percent-clipped=0.0 2024-10-08 22:28:26,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=497280.0, ans=0.0 2024-10-08 22:28:30,592 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.97 vs. limit=22.5 2024-10-08 22:28:35,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=497280.0, ans=0.125 2024-10-08 22:28:38,439 INFO [train.py:1136] (0/2) Epoch 51, batch 650, loss[loss=0.1759, simple_loss=0.2831, pruned_loss=0.0344, over 82042.00 frames. ], tot_loss[loss=0.1667, simple_loss=0.2684, pruned_loss=0.03255, over 16486871.83 frames. ], batch size: 1245, lr: 5.02e-03, grad_scale: 32.0 2024-10-08 22:28:54,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=497400.0, ans=0.2 2024-10-08 22:29:54,685 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.59 vs. limit=15.0 2024-10-08 22:30:09,934 INFO [train.py:1136] (0/2) Epoch 51, batch 700, loss[loss=0.1623, simple_loss=0.2659, pruned_loss=0.02934, over 87207.00 frames. ], tot_loss[loss=0.1666, simple_loss=0.2682, pruned_loss=0.03245, over 16618218.14 frames. ], batch size: 415, lr: 5.01e-03, grad_scale: 32.0 2024-10-08 22:30:37,610 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=498120.0, ans=0.125 2024-10-08 22:30:38,193 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2024-10-08 22:30:51,749 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=498240.0, ans=0.0 2024-10-08 22:30:53,549 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 22:31:01,384 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=498360.0, ans=0.0 2024-10-08 22:31:04,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=498360.0, ans=0.0 2024-10-08 22:31:17,500 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.194e+02 4.113e+02 4.639e+02 5.259e+02 7.593e+02, threshold=9.277e+02, percent-clipped=0.0 2024-10-08 22:31:26,473 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.02 vs. limit=10.0 2024-10-08 22:31:31,961 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=498600.0, ans=0.1 2024-10-08 22:31:34,791 INFO [train.py:1136] (0/2) Epoch 51, batch 750, loss[loss=0.1986, simple_loss=0.2995, pruned_loss=0.04885, over 78969.00 frames. ], tot_loss[loss=0.1664, simple_loss=0.2682, pruned_loss=0.03227, over 16734507.56 frames. ], batch size: 1493, lr: 5.01e-03, grad_scale: 32.0 2024-10-08 22:31:38,684 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2024-10-08 22:31:43,012 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=498600.0, ans=0.95 2024-10-08 22:32:17,005 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=498840.0, ans=0.125 2024-10-08 22:32:18,771 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 22:32:58,206 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.22 vs. limit=15.0 2024-10-08 22:32:58,484 INFO [train.py:1136] (0/2) Epoch 51, batch 800, loss[loss=0.1782, simple_loss=0.2815, pruned_loss=0.03743, over 85809.00 frames. ], tot_loss[loss=0.1668, simple_loss=0.2687, pruned_loss=0.03244, over 16808875.04 frames. ], batch size: 720, lr: 5.01e-03, grad_scale: 32.0 2024-10-08 22:33:12,182 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2024-10-08 22:33:27,131 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-51.pt 2024-10-08 22:34:07,982 INFO [train.py:1136] (0/2) Epoch 52, batch 0, loss[loss=0.1808, simple_loss=0.2884, pruned_loss=0.03663, over 85888.00 frames. ], tot_loss[loss=0.1808, simple_loss=0.2884, pruned_loss=0.03663, over 85888.00 frames. ], batch size: 721, lr: 4.96e-03, grad_scale: 16.0 2024-10-08 22:34:07,983 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 22:34:19,365 INFO [train.py:1168] (0/2) Epoch 52, validation: loss=0.1682, simple_loss=0.2768, pruned_loss=0.02978, over 1382211.00 frames. 2024-10-08 22:34:19,366 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 22:34:26,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=499392.0, ans=0.2 2024-10-08 22:34:39,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499512.0, ans=0.1 2024-10-08 22:34:49,415 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=499512.0, ans=0.125 2024-10-08 22:34:53,309 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-10-08 22:34:54,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=499512.0, ans=0.125 2024-10-08 22:34:56,488 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=499632.0, ans=0.125 2024-10-08 22:35:12,785 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.375e+02 4.079e+02 4.764e+02 5.453e+02 1.754e+03, threshold=9.527e+02, percent-clipped=1.0 2024-10-08 22:35:38,477 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499872.0, ans=0.1 2024-10-08 22:35:55,116 INFO [train.py:1136] (0/2) Epoch 52, batch 50, loss[loss=0.1604, simple_loss=0.2622, pruned_loss=0.02937, over 87342.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.2678, pruned_loss=0.03198, over 3905707.12 frames. ], batch size: 372, lr: 4.96e-03, grad_scale: 16.0 2024-10-08 22:36:03,041 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.31 vs. limit=15.0 2024-10-08 22:36:22,888 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2024-10-08 22:36:40,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=500232.0, ans=0.125 2024-10-08 22:37:12,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=500472.0, ans=0.125 2024-10-08 22:37:29,241 INFO [train.py:1136] (0/2) Epoch 52, batch 100, loss[loss=0.1848, simple_loss=0.286, pruned_loss=0.04178, over 69753.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2667, pruned_loss=0.0321, over 6830770.78 frames. ], batch size: 1960, lr: 4.95e-03, grad_scale: 16.0 2024-10-08 22:37:29,660 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=500592.0, ans=0.125 2024-10-08 22:37:34,023 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=500592.0, ans=0.0 2024-10-08 22:37:51,445 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.44 vs. limit=15.0 2024-10-08 22:38:03,714 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=500712.0, ans=0.0 2024-10-08 22:38:12,160 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=500832.0, ans=0.125 2024-10-08 22:38:12,259 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=500832.0, ans=0.0 2024-10-08 22:38:20,267 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.365e+02 3.719e+02 4.166e+02 4.686e+02 6.445e+02, threshold=8.332e+02, percent-clipped=0.0 2024-10-08 22:38:33,331 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-10-08 22:38:44,422 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=501072.0, ans=15.0 2024-10-08 22:39:02,335 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501072.0, ans=0.1 2024-10-08 22:39:05,220 INFO [train.py:1136] (0/2) Epoch 52, batch 150, loss[loss=0.1587, simple_loss=0.2534, pruned_loss=0.03201, over 86570.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.2672, pruned_loss=0.03226, over 9141832.16 frames. ], batch size: 213, lr: 4.95e-03, grad_scale: 16.0 2024-10-08 22:39:09,330 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=501192.0, ans=0.0 2024-10-08 22:39:32,023 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2024-10-08 22:40:40,231 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=501672.0, ans=0.125 2024-10-08 22:40:43,332 INFO [train.py:1136] (0/2) Epoch 52, batch 200, loss[loss=0.175, simple_loss=0.2784, pruned_loss=0.03576, over 86353.00 frames. ], tot_loss[loss=0.1662, simple_loss=0.2681, pruned_loss=0.03215, over 10928582.95 frames. ], batch size: 667, lr: 4.95e-03, grad_scale: 16.0 2024-10-08 22:41:03,759 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.18 vs. limit=10.0 2024-10-08 22:41:31,201 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=502032.0, ans=0.125 2024-10-08 22:41:34,669 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=502032.0, ans=0.07 2024-10-08 22:41:35,829 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.488e+02 4.012e+02 4.448e+02 5.274e+02 1.046e+03, threshold=8.896e+02, percent-clipped=2.0 2024-10-08 22:42:05,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=502272.0, ans=0.0 2024-10-08 22:42:09,260 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=502272.0, ans=0.125 2024-10-08 22:42:17,937 INFO [train.py:1136] (0/2) Epoch 52, batch 250, loss[loss=0.1772, simple_loss=0.282, pruned_loss=0.03615, over 86979.00 frames. ], tot_loss[loss=0.1664, simple_loss=0.2683, pruned_loss=0.03228, over 12301248.60 frames. ], batch size: 583, lr: 4.94e-03, grad_scale: 16.0 2024-10-08 22:42:18,550 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=502392.0, ans=0.1 2024-10-08 22:42:32,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502392.0, ans=0.1 2024-10-08 22:43:02,458 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=502632.0, ans=0.2 2024-10-08 22:43:50,661 INFO [train.py:1136] (0/2) Epoch 52, batch 300, loss[loss=0.1666, simple_loss=0.2574, pruned_loss=0.03791, over 85624.00 frames. ], tot_loss[loss=0.166, simple_loss=0.2679, pruned_loss=0.03208, over 13363420.73 frames. ], batch size: 180, lr: 4.94e-03, grad_scale: 16.0 2024-10-08 22:43:53,091 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=502992.0, ans=0.0 2024-10-08 22:44:16,898 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2024-10-08 22:44:19,911 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=503112.0, ans=0.125 2024-10-08 22:44:23,626 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=22.5 2024-10-08 22:44:33,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=503232.0, ans=0.125 2024-10-08 22:44:41,652 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.271e+02 3.886e+02 4.309e+02 4.801e+02 6.530e+02, threshold=8.617e+02, percent-clipped=0.0 2024-10-08 22:45:24,515 INFO [train.py:1136] (0/2) Epoch 52, batch 350, loss[loss=0.1548, simple_loss=0.2521, pruned_loss=0.02878, over 86475.00 frames. ], tot_loss[loss=0.1664, simple_loss=0.2681, pruned_loss=0.03236, over 14172059.64 frames. ], batch size: 213, lr: 4.94e-03, grad_scale: 16.0 2024-10-08 22:45:40,783 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=503712.0, ans=0.0 2024-10-08 22:45:54,053 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.52 vs. limit=10.0 2024-10-08 22:46:04,302 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503832.0, ans=0.1 2024-10-08 22:46:32,726 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=503952.0, ans=0.0 2024-10-08 22:46:36,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=503952.0, ans=0.0 2024-10-08 22:47:01,391 INFO [train.py:1136] (0/2) Epoch 52, batch 400, loss[loss=0.1552, simple_loss=0.2546, pruned_loss=0.0279, over 86783.00 frames. ], tot_loss[loss=0.1665, simple_loss=0.2682, pruned_loss=0.03237, over 14789586.75 frames. ], batch size: 229, lr: 4.94e-03, grad_scale: 32.0 2024-10-08 22:47:03,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=504192.0, ans=0.125 2024-10-08 22:47:09,524 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=504192.0, ans=0.125 2024-10-08 22:47:26,213 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=504312.0, ans=0.0 2024-10-08 22:47:29,879 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=504312.0, ans=0.0 2024-10-08 22:47:42,582 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2024-10-08 22:47:53,323 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-10-08 22:47:54,063 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.509e+02 4.054e+02 4.903e+02 5.403e+02 8.576e+02, threshold=9.806e+02, percent-clipped=0.0 2024-10-08 22:48:01,176 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=504552.0, ans=0.125 2024-10-08 22:48:04,816 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=504552.0, ans=0.2 2024-10-08 22:48:08,605 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=504552.0, ans=0.2 2024-10-08 22:48:38,911 INFO [train.py:1136] (0/2) Epoch 52, batch 450, loss[loss=0.1597, simple_loss=0.2671, pruned_loss=0.02619, over 87354.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2686, pruned_loss=0.03261, over 15242433.05 frames. ], batch size: 490, lr: 4.93e-03, grad_scale: 32.0 2024-10-08 22:48:55,431 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=504912.0, ans=0.125 2024-10-08 22:48:55,558 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=504912.0, ans=0.0 2024-10-08 22:49:50,513 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505152.0, ans=0.1 2024-10-08 22:49:52,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=505272.0, ans=0.5 2024-10-08 22:50:04,383 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=505272.0, ans=0.2 2024-10-08 22:50:10,779 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=22.5 2024-10-08 22:50:13,097 INFO [train.py:1136] (0/2) Epoch 52, batch 500, loss[loss=0.1769, simple_loss=0.2777, pruned_loss=0.03803, over 87077.00 frames. ], tot_loss[loss=0.1668, simple_loss=0.2684, pruned_loss=0.03263, over 15680677.93 frames. ], batch size: 583, lr: 4.93e-03, grad_scale: 32.0 2024-10-08 22:50:54,186 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=505632.0, ans=0.1 2024-10-08 22:51:04,761 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.486e+02 3.880e+02 4.251e+02 4.750e+02 6.652e+02, threshold=8.502e+02, percent-clipped=0.0 2024-10-08 22:51:08,636 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=505752.0, ans=0.125 2024-10-08 22:51:41,090 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.28 vs. limit=12.0 2024-10-08 22:51:46,936 INFO [train.py:1136] (0/2) Epoch 52, batch 550, loss[loss=0.177, simple_loss=0.2846, pruned_loss=0.03469, over 85505.00 frames. ], tot_loss[loss=0.1672, simple_loss=0.2688, pruned_loss=0.03283, over 15969553.83 frames. ], batch size: 787, lr: 4.93e-03, grad_scale: 32.0 2024-10-08 22:51:59,792 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=505992.0, ans=0.0 2024-10-08 22:52:03,287 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=505992.0, ans=0.0 2024-10-08 22:52:17,840 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=506112.0, ans=0.125 2024-10-08 22:52:34,013 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=506232.0, ans=0.0 2024-10-08 22:52:35,684 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=506232.0, ans=0.07 2024-10-08 22:53:12,483 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=506472.0, ans=0.125 2024-10-08 22:53:23,253 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=506592.0, ans=0.2 2024-10-08 22:53:24,360 INFO [train.py:1136] (0/2) Epoch 52, batch 600, loss[loss=0.1749, simple_loss=0.2751, pruned_loss=0.03736, over 86946.00 frames. ], tot_loss[loss=0.1674, simple_loss=0.2692, pruned_loss=0.03281, over 16223599.89 frames. ], batch size: 583, lr: 4.92e-03, grad_scale: 32.0 2024-10-08 22:53:39,053 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=506592.0, ans=0.025 2024-10-08 22:54:02,070 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=506832.0, ans=0.2 2024-10-08 22:54:05,725 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=12.0 2024-10-08 22:54:15,214 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.460e+02 3.925e+02 4.337e+02 4.949e+02 5.938e+02, threshold=8.674e+02, percent-clipped=0.0 2024-10-08 22:54:27,270 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=506952.0, ans=0.2 2024-10-08 22:54:34,800 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=506952.0, ans=0.125 2024-10-08 22:54:49,765 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=507072.0, ans=0.0 2024-10-08 22:54:59,435 INFO [train.py:1136] (0/2) Epoch 52, batch 650, loss[loss=0.1674, simple_loss=0.2695, pruned_loss=0.03268, over 86530.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.2688, pruned_loss=0.03248, over 16426861.83 frames. ], batch size: 620, lr: 4.92e-03, grad_scale: 32.0 2024-10-08 22:55:24,475 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2024-10-08 22:55:36,450 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=507432.0, ans=0.07 2024-10-08 22:56:07,255 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=507552.0, ans=0.125 2024-10-08 22:56:08,696 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=507552.0, ans=0.125 2024-10-08 22:56:13,607 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507672.0, ans=0.1 2024-10-08 22:56:21,653 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=507672.0, ans=0.025 2024-10-08 22:56:27,629 INFO [train.py:1136] (0/2) Epoch 52, batch 700, loss[loss=0.1746, simple_loss=0.2766, pruned_loss=0.03626, over 86409.00 frames. ], tot_loss[loss=0.1662, simple_loss=0.2681, pruned_loss=0.03216, over 16607743.14 frames. ], batch size: 668, lr: 4.92e-03, grad_scale: 32.0 2024-10-08 22:56:41,090 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=507792.0, ans=0.125 2024-10-08 22:57:15,079 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.254e+02 3.822e+02 4.119e+02 4.668e+02 6.511e+02, threshold=8.237e+02, percent-clipped=0.0 2024-10-08 22:57:15,946 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=22.5 2024-10-08 22:57:22,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=508152.0, ans=10.0 2024-10-08 22:57:52,841 INFO [train.py:1136] (0/2) Epoch 52, batch 750, loss[loss=0.158, simple_loss=0.2611, pruned_loss=0.02741, over 87439.00 frames. ], tot_loss[loss=0.1671, simple_loss=0.269, pruned_loss=0.03256, over 16666238.56 frames. ], batch size: 372, lr: 4.92e-03, grad_scale: 32.0 2024-10-08 22:58:01,041 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=508392.0, ans=0.2 2024-10-08 22:58:05,536 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=508392.0, ans=0.125 2024-10-08 22:58:47,887 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=508752.0, ans=0.125 2024-10-08 22:58:52,756 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=508752.0, ans=0.125 2024-10-08 22:59:17,436 INFO [train.py:1136] (0/2) Epoch 52, batch 800, loss[loss=0.1717, simple_loss=0.2767, pruned_loss=0.03338, over 86402.00 frames. ], tot_loss[loss=0.1677, simple_loss=0.2699, pruned_loss=0.03278, over 16697114.89 frames. ], batch size: 667, lr: 4.91e-03, grad_scale: 32.0 2024-10-08 22:59:43,648 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-52.pt 2024-10-08 23:00:19,764 INFO [train.py:1136] (0/2) Epoch 53, batch 0, loss[loss=0.1659, simple_loss=0.271, pruned_loss=0.03046, over 86318.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.271, pruned_loss=0.03046, over 86318.00 frames. ], batch size: 667, lr: 4.87e-03, grad_scale: 32.0 2024-10-08 23:00:19,765 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 23:00:31,070 INFO [train.py:1168] (0/2) Epoch 53, validation: loss=0.167, simple_loss=0.2743, pruned_loss=0.02981, over 1382211.00 frames. 2024-10-08 23:00:31,071 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 23:00:48,391 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=15.0 2024-10-08 23:00:55,757 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=22.5 2024-10-08 23:00:56,686 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.488e+02 4.214e+02 4.676e+02 5.142e+02 8.944e+02, threshold=9.352e+02, percent-clipped=1.0 2024-10-08 23:01:01,970 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=8.0 2024-10-08 23:01:11,239 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=509424.0, ans=0.125 2024-10-08 23:01:18,256 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=509424.0, ans=0.125 2024-10-08 23:01:20,627 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.36 vs. limit=6.0 2024-10-08 23:01:59,637 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-10-08 23:02:08,630 INFO [train.py:1136] (0/2) Epoch 53, batch 50, loss[loss=0.1612, simple_loss=0.2551, pruned_loss=0.03366, over 86361.00 frames. ], tot_loss[loss=0.1698, simple_loss=0.2726, pruned_loss=0.03349, over 3827778.37 frames. ], batch size: 197, lr: 4.86e-03, grad_scale: 32.0 2024-10-08 23:02:22,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=509784.0, ans=0.125 2024-10-08 23:03:22,601 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=510144.0, ans=0.125 2024-10-08 23:03:44,338 INFO [train.py:1136] (0/2) Epoch 53, batch 100, loss[loss=0.1633, simple_loss=0.2614, pruned_loss=0.03267, over 87317.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.2703, pruned_loss=0.03271, over 6776683.97 frames. ], batch size: 313, lr: 4.86e-03, grad_scale: 32.0 2024-10-08 23:04:04,929 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.324e+02 4.029e+02 4.289e+02 4.859e+02 5.791e+02, threshold=8.578e+02, percent-clipped=0.0 2024-10-08 23:04:27,849 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510624.0, ans=0.1 2024-10-08 23:04:31,693 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.47 vs. limit=10.0 2024-10-08 23:04:41,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=510744.0, ans=0.125 2024-10-08 23:04:54,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=510744.0, ans=0.2 2024-10-08 23:05:07,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2024-10-08 23:05:11,058 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=510864.0, ans=0.125 2024-10-08 23:05:12,853 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=510864.0, ans=0.125 2024-10-08 23:05:20,756 INFO [train.py:1136] (0/2) Epoch 53, batch 150, loss[loss=0.1699, simple_loss=0.2779, pruned_loss=0.03095, over 84619.00 frames. ], tot_loss[loss=0.167, simple_loss=0.269, pruned_loss=0.03252, over 9048057.67 frames. ], batch size: 958, lr: 4.86e-03, grad_scale: 32.0 2024-10-08 23:05:24,789 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=510984.0, ans=0.0 2024-10-08 23:06:05,666 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=511224.0, ans=0.2 2024-10-08 23:06:05,690 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511224.0, ans=0.1 2024-10-08 23:06:13,971 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=511224.0, ans=0.2 2024-10-08 23:06:17,409 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511344.0, ans=0.1 2024-10-08 23:06:18,356 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.04 vs. limit=12.0 2024-10-08 23:06:46,753 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=511464.0, ans=0.125 2024-10-08 23:06:54,148 INFO [train.py:1136] (0/2) Epoch 53, batch 200, loss[loss=0.1614, simple_loss=0.2597, pruned_loss=0.03159, over 87227.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.268, pruned_loss=0.03194, over 10863721.00 frames. ], batch size: 264, lr: 4.85e-03, grad_scale: 32.0 2024-10-08 23:06:58,844 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.75 vs. limit=15.0 2024-10-08 23:07:16,844 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=511704.0, ans=0.125 2024-10-08 23:07:18,117 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.300e+02 3.846e+02 4.384e+02 4.899e+02 7.399e+02, threshold=8.768e+02, percent-clipped=0.0 2024-10-08 23:07:18,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=511704.0, ans=10.0 2024-10-08 23:07:42,120 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=511824.0, ans=0.2 2024-10-08 23:07:53,611 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=511944.0, ans=0.025 2024-10-08 23:08:31,390 INFO [train.py:1136] (0/2) Epoch 53, batch 250, loss[loss=0.1691, simple_loss=0.2722, pruned_loss=0.03297, over 86384.00 frames. ], tot_loss[loss=0.166, simple_loss=0.2675, pruned_loss=0.0322, over 12253601.18 frames. ], batch size: 668, lr: 4.85e-03, grad_scale: 32.0 2024-10-08 23:09:34,548 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=512544.0, ans=0.125 2024-10-08 23:09:37,877 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=512544.0, ans=0.0 2024-10-08 23:10:05,962 INFO [train.py:1136] (0/2) Epoch 53, batch 300, loss[loss=0.1606, simple_loss=0.2647, pruned_loss=0.0282, over 87370.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.2677, pruned_loss=0.03228, over 13287777.44 frames. ], batch size: 372, lr: 4.85e-03, grad_scale: 32.0 2024-10-08 23:10:16,055 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-08 23:10:29,306 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.441e+02 3.841e+02 4.260e+02 4.806e+02 8.294e+02, threshold=8.520e+02, percent-clipped=0.0 2024-10-08 23:10:33,105 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=512904.0, ans=0.07 2024-10-08 23:10:48,096 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.17 vs. limit=22.5 2024-10-08 23:10:56,235 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=513024.0, ans=0.2 2024-10-08 23:11:20,786 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=513264.0, ans=0.05 2024-10-08 23:11:32,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=513264.0, ans=0.1 2024-10-08 23:11:32,123 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=513264.0, ans=0.0 2024-10-08 23:11:37,371 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=513264.0, ans=0.1 2024-10-08 23:11:38,181 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.21 vs. limit=15.0 2024-10-08 23:11:41,964 INFO [train.py:1136] (0/2) Epoch 53, batch 350, loss[loss=0.1659, simple_loss=0.267, pruned_loss=0.03239, over 87103.00 frames. ], tot_loss[loss=0.1665, simple_loss=0.2683, pruned_loss=0.03237, over 14159472.47 frames. ], batch size: 330, lr: 4.85e-03, grad_scale: 32.0 2024-10-08 23:12:09,304 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=513504.0, ans=0.0 2024-10-08 23:12:13,165 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.39 vs. limit=15.0 2024-10-08 23:12:27,129 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=513624.0, ans=0.125 2024-10-08 23:12:30,782 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=513624.0, ans=0.125 2024-10-08 23:12:46,697 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=513744.0, ans=10.0 2024-10-08 23:13:05,721 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=513864.0, ans=0.125 2024-10-08 23:13:15,731 INFO [train.py:1136] (0/2) Epoch 53, batch 400, loss[loss=0.1884, simple_loss=0.2918, pruned_loss=0.0425, over 78829.00 frames. ], tot_loss[loss=0.1664, simple_loss=0.2684, pruned_loss=0.03223, over 14818677.28 frames. ], batch size: 1493, lr: 4.84e-03, grad_scale: 32.0 2024-10-08 23:13:25,760 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=513984.0, ans=0.125 2024-10-08 23:13:38,693 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.271e+02 3.937e+02 4.170e+02 4.672e+02 7.210e+02, threshold=8.340e+02, percent-clipped=0.0 2024-10-08 23:13:53,785 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=514224.0, ans=0.0 2024-10-08 23:14:51,253 INFO [train.py:1136] (0/2) Epoch 53, batch 450, loss[loss=0.1755, simple_loss=0.2748, pruned_loss=0.03807, over 87032.00 frames. ], tot_loss[loss=0.166, simple_loss=0.268, pruned_loss=0.03202, over 15345198.91 frames. ], batch size: 583, lr: 4.84e-03, grad_scale: 32.0 2024-10-08 23:15:13,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514704.0, ans=0.1 2024-10-08 23:15:44,916 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=514824.0, ans=0.125 2024-10-08 23:15:56,870 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=514944.0, ans=0.05 2024-10-08 23:16:03,966 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=515064.0, ans=0.125 2024-10-08 23:16:16,507 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=515064.0, ans=0.0 2024-10-08 23:16:25,146 INFO [train.py:1136] (0/2) Epoch 53, batch 500, loss[loss=0.1707, simple_loss=0.2749, pruned_loss=0.03327, over 85602.00 frames. ], tot_loss[loss=0.1656, simple_loss=0.2677, pruned_loss=0.03179, over 15767936.90 frames. ], batch size: 787, lr: 4.84e-03, grad_scale: 16.0 2024-10-08 23:16:31,658 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=515184.0, ans=0.02 2024-10-08 23:16:50,229 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.395e+02 3.899e+02 4.215e+02 4.917e+02 8.024e+02, threshold=8.429e+02, percent-clipped=0.0 2024-10-08 23:16:57,875 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=515304.0, ans=0.125 2024-10-08 23:17:05,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=515424.0, ans=0.0 2024-10-08 23:17:16,020 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515424.0, ans=0.1 2024-10-08 23:17:41,609 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=515664.0, ans=0.1 2024-10-08 23:17:45,188 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=515664.0, ans=0.125 2024-10-08 23:17:52,263 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=515664.0, ans=0.0 2024-10-08 23:17:57,295 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 23:17:58,527 INFO [train.py:1136] (0/2) Epoch 53, batch 550, loss[loss=0.1599, simple_loss=0.2672, pruned_loss=0.02632, over 87328.00 frames. ], tot_loss[loss=0.1656, simple_loss=0.2677, pruned_loss=0.03179, over 16084943.44 frames. ], batch size: 439, lr: 4.83e-03, grad_scale: 16.0 2024-10-08 23:18:14,631 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2024-10-08 23:18:30,021 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=515904.0, ans=0.125 2024-10-08 23:18:31,401 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=515904.0, ans=0.1 2024-10-08 23:18:42,052 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=516024.0, ans=0.125 2024-10-08 23:18:46,960 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=516024.0, ans=0.125 2024-10-08 23:19:00,335 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.68 vs. limit=15.0 2024-10-08 23:19:11,206 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=516144.0, ans=0.2 2024-10-08 23:19:22,941 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=516264.0, ans=0.125 2024-10-08 23:19:24,982 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=22.5 2024-10-08 23:19:34,568 INFO [train.py:1136] (0/2) Epoch 53, batch 600, loss[loss=0.1711, simple_loss=0.2781, pruned_loss=0.03203, over 85487.00 frames. ], tot_loss[loss=0.166, simple_loss=0.2681, pruned_loss=0.03195, over 16332061.50 frames. ], batch size: 786, lr: 4.83e-03, grad_scale: 8.0 2024-10-08 23:19:35,141 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=516384.0, ans=0.0 2024-10-08 23:20:00,903 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.533e+02 3.959e+02 4.264e+02 4.755e+02 7.147e+02, threshold=8.529e+02, percent-clipped=0.0 2024-10-08 23:20:17,539 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=516624.0, ans=0.07 2024-10-08 23:21:02,232 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-10-08 23:21:10,124 INFO [train.py:1136] (0/2) Epoch 53, batch 650, loss[loss=0.1622, simple_loss=0.2558, pruned_loss=0.03431, over 87355.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2675, pruned_loss=0.03164, over 16514512.25 frames. ], batch size: 280, lr: 4.83e-03, grad_scale: 8.0 2024-10-08 23:21:12,493 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=516984.0, ans=0.0 2024-10-08 23:21:51,757 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=517224.0, ans=0.125 2024-10-08 23:21:53,524 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 23:21:56,706 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=517224.0, ans=0.125 2024-10-08 23:22:38,654 INFO [train.py:1136] (0/2) Epoch 53, batch 700, loss[loss=0.188, simple_loss=0.286, pruned_loss=0.04495, over 69700.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.268, pruned_loss=0.03194, over 16626692.83 frames. ], batch size: 1960, lr: 4.83e-03, grad_scale: 8.0 2024-10-08 23:23:01,251 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.581e+02 4.083e+02 4.543e+02 5.161e+02 8.025e+02, threshold=9.086e+02, percent-clipped=0.0 2024-10-08 23:23:21,112 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=517824.0, ans=0.125 2024-10-08 23:23:42,340 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517944.0, ans=0.1 2024-10-08 23:24:02,688 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.01 vs. limit=10.0 2024-10-08 23:24:03,564 INFO [train.py:1136] (0/2) Epoch 53, batch 750, loss[loss=0.1781, simple_loss=0.2803, pruned_loss=0.03794, over 85797.00 frames. ], tot_loss[loss=0.1664, simple_loss=0.2684, pruned_loss=0.03213, over 16735174.27 frames. ], batch size: 721, lr: 4.82e-03, grad_scale: 8.0 2024-10-08 23:24:23,027 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=518304.0, ans=22.5 2024-10-08 23:24:40,064 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=518424.0, ans=0.0 2024-10-08 23:25:28,334 INFO [train.py:1136] (0/2) Epoch 53, batch 800, loss[loss=0.161, simple_loss=0.2625, pruned_loss=0.02977, over 87087.00 frames. ], tot_loss[loss=0.1668, simple_loss=0.2686, pruned_loss=0.03248, over 16708683.71 frames. ], batch size: 350, lr: 4.82e-03, grad_scale: 16.0 2024-10-08 23:25:44,449 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=518904.0, ans=0.0 2024-10-08 23:25:50,250 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.291e+02 4.094e+02 4.741e+02 5.291e+02 9.402e+02, threshold=9.482e+02, percent-clipped=1.0 2024-10-08 23:25:53,180 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-53.pt 2024-10-08 23:26:31,237 INFO [train.py:1136] (0/2) Epoch 54, batch 0, loss[loss=0.1632, simple_loss=0.2652, pruned_loss=0.03058, over 87157.00 frames. ], tot_loss[loss=0.1632, simple_loss=0.2652, pruned_loss=0.03058, over 87157.00 frames. ], batch size: 350, lr: 4.77e-03, grad_scale: 32.0 2024-10-08 23:26:31,238 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 23:26:42,530 INFO [train.py:1168] (0/2) Epoch 54, validation: loss=0.1666, simple_loss=0.2738, pruned_loss=0.02968, over 1382211.00 frames. 2024-10-08 23:26:42,530 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 23:27:17,650 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519216.0, ans=0.1 2024-10-08 23:27:46,087 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=519336.0, ans=0.0 2024-10-08 23:27:49,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=519336.0, ans=0.125 2024-10-08 23:28:04,820 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=519456.0, ans=0.125 2024-10-08 23:28:16,084 INFO [train.py:1136] (0/2) Epoch 54, batch 50, loss[loss=0.1637, simple_loss=0.261, pruned_loss=0.03315, over 87232.00 frames. ], tot_loss[loss=0.166, simple_loss=0.268, pruned_loss=0.03197, over 3878224.26 frames. ], batch size: 296, lr: 4.77e-03, grad_scale: 32.0 2024-10-08 23:28:31,329 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=519576.0, ans=0.2 2024-10-08 23:28:40,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=519696.0, ans=0.1 2024-10-08 23:29:39,006 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-10-08 23:29:43,538 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=520056.0, ans=0.125 2024-10-08 23:29:46,468 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.339e+02 3.906e+02 4.134e+02 4.506e+02 5.739e+02, threshold=8.267e+02, percent-clipped=0.0 2024-10-08 23:29:52,760 INFO [train.py:1136] (0/2) Epoch 54, batch 100, loss[loss=0.1513, simple_loss=0.2501, pruned_loss=0.02631, over 86468.00 frames. ], tot_loss[loss=0.165, simple_loss=0.267, pruned_loss=0.03153, over 6842456.54 frames. ], batch size: 213, lr: 4.77e-03, grad_scale: 32.0 2024-10-08 23:29:58,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=520176.0, ans=0.2 2024-10-08 23:30:15,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=520296.0, ans=0.125 2024-10-08 23:30:29,429 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=520416.0, ans=0.5 2024-10-08 23:30:37,080 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2024-10-08 23:30:47,382 INFO [scaling.py:1024] (0/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2024-10-08 23:30:58,939 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=22.5 2024-10-08 23:31:25,927 INFO [train.py:1136] (0/2) Epoch 54, batch 150, loss[loss=0.1571, simple_loss=0.262, pruned_loss=0.02606, over 87270.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.2661, pruned_loss=0.03133, over 9156688.15 frames. ], batch size: 439, lr: 4.77e-03, grad_scale: 32.0 2024-10-08 23:31:46,191 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=520896.0, ans=0.125 2024-10-08 23:31:58,249 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=520896.0, ans=0.125 2024-10-08 23:32:04,830 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=521016.0, ans=0.125 2024-10-08 23:32:06,782 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=22.5 2024-10-08 23:32:32,077 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=521136.0, ans=0.125 2024-10-08 23:32:49,145 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=521256.0, ans=0.0 2024-10-08 23:32:55,608 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.249e+02 3.777e+02 4.174e+02 4.756e+02 6.694e+02, threshold=8.349e+02, percent-clipped=0.0 2024-10-08 23:32:57,764 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=521376.0, ans=0.125 2024-10-08 23:32:59,006 INFO [train.py:1136] (0/2) Epoch 54, batch 200, loss[loss=0.1755, simple_loss=0.2771, pruned_loss=0.03692, over 86437.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2665, pruned_loss=0.03134, over 10925454.50 frames. ], batch size: 620, lr: 4.76e-03, grad_scale: 32.0 2024-10-08 23:33:01,110 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521376.0, ans=0.1 2024-10-08 23:33:21,119 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=15.0 2024-10-08 23:33:57,474 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521736.0, ans=0.1 2024-10-08 23:34:34,576 INFO [train.py:1136] (0/2) Epoch 54, batch 250, loss[loss=0.1541, simple_loss=0.2537, pruned_loss=0.02729, over 86723.00 frames. ], tot_loss[loss=0.1649, simple_loss=0.2669, pruned_loss=0.03146, over 12329879.37 frames. ], batch size: 229, lr: 4.76e-03, grad_scale: 32.0 2024-10-08 23:34:45,082 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=521976.0, ans=0.025 2024-10-08 23:35:35,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=522336.0, ans=0.0 2024-10-08 23:36:01,465 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.402e+02 3.868e+02 4.169e+02 4.780e+02 7.142e+02, threshold=8.339e+02, percent-clipped=0.0 2024-10-08 23:36:01,944 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=522456.0, ans=0.0 2024-10-08 23:36:04,826 INFO [train.py:1136] (0/2) Epoch 54, batch 300, loss[loss=0.1647, simple_loss=0.2631, pruned_loss=0.03311, over 87457.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2663, pruned_loss=0.03143, over 13405853.55 frames. ], batch size: 296, lr: 4.76e-03, grad_scale: 32.0 2024-10-08 23:36:29,427 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=522696.0, ans=0.0 2024-10-08 23:36:37,031 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2024-10-08 23:36:47,028 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=522816.0, ans=0.125 2024-10-08 23:36:52,367 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=522816.0, ans=0.0 2024-10-08 23:37:14,508 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=12.0 2024-10-08 23:37:17,534 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=522936.0, ans=0.0 2024-10-08 23:37:29,437 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=523056.0, ans=0.1 2024-10-08 23:37:31,248 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=523056.0, ans=0.125 2024-10-08 23:37:41,729 INFO [train.py:1136] (0/2) Epoch 54, batch 350, loss[loss=0.1628, simple_loss=0.2544, pruned_loss=0.03557, over 85510.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.2669, pruned_loss=0.03163, over 14235180.76 frames. ], batch size: 180, lr: 4.76e-03, grad_scale: 32.0 2024-10-08 23:37:50,982 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=523176.0, ans=0.125 2024-10-08 23:38:00,311 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=523296.0, ans=0.1 2024-10-08 23:38:55,137 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=523536.0, ans=0.125 2024-10-08 23:39:13,975 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.492e+02 3.878e+02 4.183e+02 4.829e+02 7.547e+02, threshold=8.365e+02, percent-clipped=0.0 2024-10-08 23:39:14,502 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=523776.0, ans=0.0 2024-10-08 23:39:15,773 INFO [train.py:1136] (0/2) Epoch 54, batch 400, loss[loss=0.1733, simple_loss=0.2779, pruned_loss=0.03439, over 85530.00 frames. ], tot_loss[loss=0.1647, simple_loss=0.2666, pruned_loss=0.03145, over 14889146.75 frames. ], batch size: 787, lr: 4.75e-03, grad_scale: 32.0 2024-10-08 23:39:16,169 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=523776.0, ans=0.2 2024-10-08 23:39:20,016 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=523776.0, ans=0.125 2024-10-08 23:40:01,257 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=524016.0, ans=0.125 2024-10-08 23:40:43,880 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2024-10-08 23:40:49,804 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=524256.0, ans=0.125 2024-10-08 23:40:51,375 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=524376.0, ans=0.125 2024-10-08 23:40:52,881 INFO [train.py:1136] (0/2) Epoch 54, batch 450, loss[loss=0.1763, simple_loss=0.2839, pruned_loss=0.03433, over 83456.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.2673, pruned_loss=0.03183, over 15364458.21 frames. ], batch size: 1078, lr: 4.75e-03, grad_scale: 32.0 2024-10-08 23:41:41,254 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524616.0, ans=0.1 2024-10-08 23:41:43,101 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=524616.0, ans=0.5 2024-10-08 23:41:47,072 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=15.0 2024-10-08 23:42:26,626 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.465e+02 3.827e+02 4.225e+02 4.763e+02 7.196e+02, threshold=8.450e+02, percent-clipped=0.0 2024-10-08 23:42:28,224 INFO [train.py:1136] (0/2) Epoch 54, batch 500, loss[loss=0.1726, simple_loss=0.2768, pruned_loss=0.03423, over 86450.00 frames. ], tot_loss[loss=0.1658, simple_loss=0.268, pruned_loss=0.0318, over 15718247.70 frames. ], batch size: 668, lr: 4.75e-03, grad_scale: 32.0 2024-10-08 23:42:36,309 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=524976.0, ans=0.2 2024-10-08 23:43:15,424 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=525216.0, ans=0.125 2024-10-08 23:43:16,970 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=525216.0, ans=10.0 2024-10-08 23:43:35,980 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=525336.0, ans=0.125 2024-10-08 23:43:37,612 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525336.0, ans=0.1 2024-10-08 23:43:43,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=525456.0, ans=0.125 2024-10-08 23:43:45,754 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.44 vs. limit=10.0 2024-10-08 23:43:50,472 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=525456.0, ans=0.0 2024-10-08 23:43:50,482 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525456.0, ans=0.1 2024-10-08 23:43:57,908 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=525456.0, ans=0.0 2024-10-08 23:44:01,596 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-10-08 23:44:04,620 INFO [train.py:1136] (0/2) Epoch 54, batch 550, loss[loss=0.1696, simple_loss=0.2635, pruned_loss=0.03784, over 87341.00 frames. ], tot_loss[loss=0.1663, simple_loss=0.2685, pruned_loss=0.03206, over 16031152.61 frames. ], batch size: 280, lr: 4.75e-03, grad_scale: 32.0 2024-10-08 23:44:14,164 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=15.0 2024-10-08 23:44:20,370 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=525696.0, ans=0.125 2024-10-08 23:44:22,272 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=525696.0, ans=0.125 2024-10-08 23:44:27,596 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=525696.0, ans=0.125 2024-10-08 23:44:41,855 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=525816.0, ans=0.0 2024-10-08 23:44:47,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=525816.0, ans=0.125 2024-10-08 23:45:11,582 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=525936.0, ans=0.125 2024-10-08 23:45:13,204 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=525936.0, ans=0.2 2024-10-08 23:45:13,214 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=525936.0, ans=0.0 2024-10-08 23:45:33,396 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.510e+02 4.265e+02 5.007e+02 5.745e+02 7.391e+02, threshold=1.001e+03, percent-clipped=0.0 2024-10-08 23:45:37,643 INFO [train.py:1136] (0/2) Epoch 54, batch 600, loss[loss=0.1682, simple_loss=0.2764, pruned_loss=0.03002, over 85362.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2675, pruned_loss=0.03163, over 16312378.63 frames. ], batch size: 866, lr: 4.74e-03, grad_scale: 32.0 2024-10-08 23:46:13,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=526416.0, ans=0.125 2024-10-08 23:46:17,198 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.44 vs. limit=15.0 2024-10-08 23:46:23,670 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=526416.0, ans=0.125 2024-10-08 23:46:48,931 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526536.0, ans=0.125 2024-10-08 23:46:59,171 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=526656.0, ans=0.2 2024-10-08 23:47:01,502 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=15.0 2024-10-08 23:47:04,525 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=526656.0, ans=0.1 2024-10-08 23:47:10,874 INFO [train.py:1136] (0/2) Epoch 54, batch 650, loss[loss=0.1708, simple_loss=0.2755, pruned_loss=0.03301, over 86347.00 frames. ], tot_loss[loss=0.165, simple_loss=0.2672, pruned_loss=0.03137, over 16521344.60 frames. ], batch size: 667, lr: 4.74e-03, grad_scale: 32.0 2024-10-08 23:47:11,419 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=526776.0, ans=0.2 2024-10-08 23:47:20,361 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=526776.0, ans=0.0 2024-10-08 23:47:26,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=526776.0, ans=0.125 2024-10-08 23:47:50,196 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-08 23:47:59,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=527016.0, ans=6.0 2024-10-08 23:48:22,179 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527136.0, ans=0.1 2024-10-08 23:48:42,634 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.378e+02 3.996e+02 4.258e+02 4.705e+02 6.722e+02, threshold=8.516e+02, percent-clipped=0.0 2024-10-08 23:48:42,654 INFO [train.py:1136] (0/2) Epoch 54, batch 700, loss[loss=0.1604, simple_loss=0.2595, pruned_loss=0.03061, over 87190.00 frames. ], tot_loss[loss=0.1652, simple_loss=0.2675, pruned_loss=0.03145, over 16646835.20 frames. ], batch size: 330, lr: 4.74e-03, grad_scale: 16.0 2024-10-08 23:48:57,047 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=22.5 2024-10-08 23:49:37,038 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=15.0 2024-10-08 23:50:06,240 INFO [train.py:1136] (0/2) Epoch 54, batch 750, loss[loss=0.1756, simple_loss=0.2833, pruned_loss=0.03394, over 83361.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.2679, pruned_loss=0.03157, over 16752907.37 frames. ], batch size: 1079, lr: 4.74e-03, grad_scale: 16.0 2024-10-08 23:50:08,112 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/checkpoint-44000.pt 2024-10-08 23:50:28,729 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=528096.0, ans=0.1 2024-10-08 23:50:41,113 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=528216.0, ans=0.125 2024-10-08 23:51:30,390 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=15.0 2024-10-08 23:51:32,773 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.432e+02 4.052e+02 4.657e+02 5.550e+02 7.999e+02, threshold=9.315e+02, percent-clipped=0.0 2024-10-08 23:51:32,794 INFO [train.py:1136] (0/2) Epoch 54, batch 800, loss[loss=0.1722, simple_loss=0.2802, pruned_loss=0.03213, over 83515.00 frames. ], tot_loss[loss=0.1663, simple_loss=0.2684, pruned_loss=0.03206, over 16759600.77 frames. ], batch size: 1078, lr: 4.73e-03, grad_scale: 32.0 2024-10-08 23:51:34,807 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=528576.0, ans=0.07 2024-10-08 23:51:36,323 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=528576.0, ans=0.125 2024-10-08 23:51:51,991 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=12.0 2024-10-08 23:51:54,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=528696.0, ans=0.0 2024-10-08 23:52:00,461 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-54.pt 2024-10-08 23:52:35,737 INFO [train.py:1136] (0/2) Epoch 55, batch 0, loss[loss=0.1628, simple_loss=0.2568, pruned_loss=0.0344, over 85415.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.2568, pruned_loss=0.0344, over 85415.00 frames. ], batch size: 180, lr: 4.69e-03, grad_scale: 32.0 2024-10-08 23:52:35,739 INFO [train.py:1159] (0/2) Computing validation loss 2024-10-08 23:52:46,638 INFO [train.py:1168] (0/2) Epoch 55, validation: loss=0.168, simple_loss=0.2759, pruned_loss=0.03001, over 1382211.00 frames. 2024-10-08 23:52:46,639 INFO [train.py:1169] (0/2) Maximum memory allocated so far is 55604MB 2024-10-08 23:53:02,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=12.0 2024-10-08 23:53:19,326 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=528888.0, ans=0.125 2024-10-08 23:53:23,650 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-10-08 23:54:20,077 INFO [train.py:1136] (0/2) Epoch 55, batch 50, loss[loss=0.158, simple_loss=0.2656, pruned_loss=0.02516, over 87209.00 frames. ], tot_loss[loss=0.163, simple_loss=0.2655, pruned_loss=0.03026, over 3890924.04 frames. ], batch size: 517, lr: 4.69e-03, grad_scale: 16.0 2024-10-08 23:55:25,045 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.357e+02 3.891e+02 4.133e+02 4.801e+02 7.280e+02, threshold=8.267e+02, percent-clipped=0.0 2024-10-08 23:55:27,421 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=529728.0, ans=0.025 2024-10-08 23:55:38,529 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.03 vs. limit=12.0 2024-10-08 23:55:41,080 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=529848.0, ans=0.0 2024-10-08 23:55:50,563 INFO [train.py:1136] (0/2) Epoch 55, batch 100, loss[loss=0.1583, simple_loss=0.263, pruned_loss=0.02681, over 87244.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.2654, pruned_loss=0.03115, over 6850827.71 frames. ], batch size: 464, lr: 4.68e-03, grad_scale: 16.0 2024-10-08 23:55:59,457 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=529968.0, ans=0.125 2024-10-08 23:56:06,490 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=530088.0, ans=0.0 2024-10-08 23:56:14,245 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=530088.0, ans=0.125 2024-10-08 23:56:39,297 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2024-10-08 23:56:58,241 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=15.0 2024-10-08 23:57:06,135 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=530448.0, ans=0.05 2024-10-08 23:57:06,983 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.88 vs. limit=12.0 2024-10-08 23:57:26,093 INFO [train.py:1136] (0/2) Epoch 55, batch 150, loss[loss=0.1758, simple_loss=0.2797, pruned_loss=0.036, over 86369.00 frames. ], tot_loss[loss=0.1639, simple_loss=0.2649, pruned_loss=0.03143, over 9151047.70 frames. ], batch size: 668, lr: 4.68e-03, grad_scale: 16.0 2024-10-08 23:57:40,563 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=530568.0, ans=0.0 2024-10-08 23:58:05,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=530808.0, ans=0.0 2024-10-08 23:58:07,181 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=530808.0, ans=0.125 2024-10-08 23:58:13,192 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=530808.0, ans=0.0 2024-10-08 23:58:31,799 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.371e+02 3.876e+02 4.199e+02 4.945e+02 7.080e+02, threshold=8.398e+02, percent-clipped=0.0 2024-10-08 23:58:50,790 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=531048.0, ans=0.0 2024-10-08 23:58:56,032 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=531048.0, ans=0.125 2024-10-08 23:59:03,398 INFO [train.py:1136] (0/2) Epoch 55, batch 200, loss[loss=0.1585, simple_loss=0.2641, pruned_loss=0.0265, over 87369.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.2662, pruned_loss=0.03151, over 10912536.55 frames. ], batch size: 490, lr: 4.68e-03, grad_scale: 16.0 2024-10-08 23:59:03,935 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=531168.0, ans=0.1 2024-10-08 23:59:54,573 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=531408.0, ans=0.2 2024-10-09 00:00:10,117 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=531528.0, ans=0.125 2024-10-09 00:00:13,571 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531528.0, ans=0.1 2024-10-09 00:00:17,161 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-10-09 00:00:25,482 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-09 00:00:31,046 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=531648.0, ans=0.1 2024-10-09 00:00:37,108 INFO [train.py:1136] (0/2) Epoch 55, batch 250, loss[loss=0.1702, simple_loss=0.2727, pruned_loss=0.0339, over 86422.00 frames. ], tot_loss[loss=0.1656, simple_loss=0.2672, pruned_loss=0.03197, over 12243421.29 frames. ], batch size: 668, lr: 4.68e-03, grad_scale: 16.0 2024-10-09 00:01:08,553 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2024-10-09 00:01:12,817 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=532008.0, ans=0.125 2024-10-09 00:01:16,174 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=532008.0, ans=0.0 2024-10-09 00:01:30,996 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2024-10-09 00:01:44,273 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.344e+02 4.324e+02 4.893e+02 5.542e+02 8.017e+02, threshold=9.786e+02, percent-clipped=0.0 2024-10-09 00:01:53,320 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=532248.0, ans=0.2 2024-10-09 00:01:55,216 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=532248.0, ans=0.125 2024-10-09 00:02:09,977 INFO [train.py:1136] (0/2) Epoch 55, batch 300, loss[loss=0.1609, simple_loss=0.2628, pruned_loss=0.02954, over 87134.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2672, pruned_loss=0.03187, over 13353091.63 frames. ], batch size: 350, lr: 4.67e-03, grad_scale: 16.0 2024-10-09 00:02:11,956 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=532368.0, ans=0.025 2024-10-09 00:02:17,829 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.23 vs. limit=15.0 2024-10-09 00:02:50,949 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=532608.0, ans=0.125 2024-10-09 00:02:53,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.62 vs. limit=15.0 2024-10-09 00:02:56,333 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=532608.0, ans=0.025 2024-10-09 00:03:25,017 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=532848.0, ans=0.0 2024-10-09 00:03:46,246 INFO [train.py:1136] (0/2) Epoch 55, batch 350, loss[loss=0.1705, simple_loss=0.2651, pruned_loss=0.03797, over 87251.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.2667, pruned_loss=0.03174, over 14199966.19 frames. ], batch size: 280, lr: 4.67e-03, grad_scale: 16.0 2024-10-09 00:03:50,466 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532968.0, ans=0.1 2024-10-09 00:03:54,049 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=532968.0, ans=0.125 2024-10-09 00:04:05,438 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.07 vs. limit=15.0 2024-10-09 00:04:06,646 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533088.0, ans=0.1 2024-10-09 00:04:08,516 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-10-09 00:04:21,986 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=533088.0, ans=0.125 2024-10-09 00:04:55,055 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.456e+02 3.961e+02 4.287e+02 4.846e+02 7.404e+02, threshold=8.574e+02, percent-clipped=0.0 2024-10-09 00:05:07,623 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=533448.0, ans=0.0 2024-10-09 00:05:07,925 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=22.5 2024-10-09 00:05:22,969 INFO [train.py:1136] (0/2) Epoch 55, batch 400, loss[loss=0.1801, simple_loss=0.286, pruned_loss=0.03706, over 81780.00 frames. ], tot_loss[loss=0.1656, simple_loss=0.2675, pruned_loss=0.03181, over 14828417.87 frames. ], batch size: 1245, lr: 4.67e-03, grad_scale: 32.0 2024-10-09 00:05:37,397 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=533568.0, ans=0.0 2024-10-09 00:05:39,225 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=533688.0, ans=0.0 2024-10-09 00:05:49,369 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=22.5 2024-10-09 00:06:10,308 INFO [scaling.py:1120] (0/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-10-09 00:06:20,750 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533928.0, ans=0.1 2024-10-09 00:06:40,810 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=534048.0, ans=0.125 2024-10-09 00:06:57,456 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=534048.0, ans=0.125 2024-10-09 00:07:00,387 INFO [train.py:1136] (0/2) Epoch 55, batch 450, loss[loss=0.1641, simple_loss=0.2645, pruned_loss=0.03188, over 87082.00 frames. ], tot_loss[loss=0.1657, simple_loss=0.2677, pruned_loss=0.03184, over 15314860.19 frames. ], batch size: 350, lr: 4.66e-03, grad_scale: 16.0 2024-10-09 00:07:08,315 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-10-09 00:07:19,439 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=534288.0, ans=0.0 2024-10-09 00:07:26,500 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=534288.0, ans=0.05 2024-10-09 00:07:30,180 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=534288.0, ans=0.125 2024-10-09 00:07:46,368 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=534408.0, ans=0.125 2024-10-09 00:07:53,268 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=534408.0, ans=0.025 2024-10-09 00:07:58,212 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=534528.0, ans=0.125 2024-10-09 00:08:08,233 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.480e+02 3.920e+02 4.303e+02 4.979e+02 7.145e+02, threshold=8.605e+02, percent-clipped=0.0 2024-10-09 00:08:25,221 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=534648.0, ans=22.5 2024-10-09 00:08:37,299 INFO [train.py:1136] (0/2) Epoch 55, batch 500, loss[loss=0.1965, simple_loss=0.2968, pruned_loss=0.04808, over 78394.00 frames. ], tot_loss[loss=0.1657, simple_loss=0.2676, pruned_loss=0.03194, over 15683505.47 frames. ], batch size: 1493, lr: 4.66e-03, grad_scale: 16.0 2024-10-09 00:08:39,562 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=534768.0, ans=0.125 2024-10-09 00:08:43,980 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.68 vs. limit=10.0 2024-10-09 00:08:46,545 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=534768.0, ans=0.125 2024-10-09 00:08:58,549 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534888.0, ans=0.1 2024-10-09 00:09:16,771 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-10-09 00:09:20,689 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-10-09 00:10:10,525 INFO [train.py:1136] (0/2) Epoch 55, batch 550, loss[loss=0.1706, simple_loss=0.2758, pruned_loss=0.03268, over 85904.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2673, pruned_loss=0.03182, over 16028177.97 frames. ], batch size: 721, lr: 4.66e-03, grad_scale: 16.0 2024-10-09 00:10:25,115 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=535368.0, ans=0.125 2024-10-09 00:10:59,347 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=535608.0, ans=0.125 2024-10-09 00:11:06,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535728.0, ans=0.1 2024-10-09 00:11:17,178 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-10-09 00:11:18,156 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.344e+02 3.890e+02 4.273e+02 5.001e+02 6.635e+02, threshold=8.547e+02, percent-clipped=0.0 2024-10-09 00:11:31,451 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535848.0, ans=0.1 2024-10-09 00:11:47,332 INFO [train.py:1136] (0/2) Epoch 55, batch 600, loss[loss=0.1714, simple_loss=0.278, pruned_loss=0.03237, over 84428.00 frames. ], tot_loss[loss=0.1657, simple_loss=0.2676, pruned_loss=0.03193, over 16269013.78 frames. ], batch size: 958, lr: 4.66e-03, grad_scale: 16.0 2024-10-09 00:11:53,562 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.32 vs. limit=10.0 2024-10-09 00:12:07,274 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=536088.0, ans=0.0 2024-10-09 00:12:38,905 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=536208.0, ans=0.125 2024-10-09 00:12:45,834 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=536328.0, ans=0.125 2024-10-09 00:12:56,271 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=536328.0, ans=0.125 2024-10-09 00:13:05,067 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=536448.0, ans=0.125 2024-10-09 00:13:12,565 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=536448.0, ans=0.125 2024-10-09 00:13:20,506 INFO [train.py:1136] (0/2) Epoch 55, batch 650, loss[loss=0.1508, simple_loss=0.2504, pruned_loss=0.02565, over 87138.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.2674, pruned_loss=0.03179, over 16442934.55 frames. ], batch size: 264, lr: 4.65e-03, grad_scale: 8.0 2024-10-09 00:13:21,054 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=536568.0, ans=0.0 2024-10-09 00:13:26,234 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=536568.0, ans=0.0 2024-10-09 00:13:51,602 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=536688.0, ans=0.025 2024-10-09 00:14:28,688 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.248e+02 4.105e+02 4.441e+02 5.046e+02 1.403e+03, threshold=8.883e+02, percent-clipped=1.0 2024-10-09 00:14:47,555 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=537048.0, ans=0.0 2024-10-09 00:14:51,923 INFO [train.py:1136] (0/2) Epoch 55, batch 700, loss[loss=0.1555, simple_loss=0.2563, pruned_loss=0.02732, over 86625.00 frames. ], tot_loss[loss=0.1654, simple_loss=0.2674, pruned_loss=0.03172, over 16615735.23 frames. ], batch size: 229, lr: 4.65e-03, grad_scale: 8.0 2024-10-09 00:14:52,258 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=537168.0, ans=0.125 2024-10-09 00:15:13,385 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=537288.0, ans=0.125 2024-10-09 00:15:13,864 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.73 vs. limit=15.0 2024-10-09 00:15:29,265 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537408.0, ans=0.1 2024-10-09 00:16:16,041 INFO [train.py:1136] (0/2) Epoch 55, batch 750, loss[loss=0.1551, simple_loss=0.2513, pruned_loss=0.0294, over 86740.00 frames. ], tot_loss[loss=0.1658, simple_loss=0.2679, pruned_loss=0.03189, over 16692933.08 frames. ], batch size: 213, lr: 4.65e-03, grad_scale: 8.0 2024-10-09 00:16:43,608 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=537888.0, ans=0.0 2024-10-09 00:16:52,946 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=538008.0, ans=0.2 2024-10-09 00:17:19,954 WARNING [optim.py:487] (0/2) Clipping_scale=2.0, grad-norm quartiles 3.307e+02 3.864e+02 4.457e+02 5.287e+02 1.571e+03, threshold=8.914e+02, percent-clipped=2.0 2024-10-09 00:17:26,595 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=538248.0, ans=0.07 2024-10-09 00:17:28,600 INFO [scaling.py:1024] (0/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=11.98 vs. limit=22.5 2024-10-09 00:17:31,298 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=538248.0, ans=0.1 2024-10-09 00:17:34,438 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=538248.0, ans=0.09899494936611666 2024-10-09 00:17:41,133 INFO [train.py:1136] (0/2) Epoch 55, batch 800, loss[loss=0.1844, simple_loss=0.2806, pruned_loss=0.04409, over 69552.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.269, pruned_loss=0.03237, over 16674989.81 frames. ], batch size: 1960, lr: 4.65e-03, grad_scale: 16.0 2024-10-09 00:17:41,448 INFO [scaling.py:214] (0/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=538368.0, ans=0.125 2024-10-09 00:18:06,357 INFO [checkpoint.py:75] (0/2) Saving checkpoint to zipformer/exp/epoch-55.pt 2024-10-09 00:18:07,793 INFO [train.py:1426] (0/2) Done!